MENU

毎日をちょっと楽しく快適にするライフハック大全

【pandas】データフレームの行をdf[bool]・queryで絞り込む

2021-01-132022-05-05

当ページのリンクには広告が含まれています。

pandas.DataFrame操作では頻出のデータフレームの絞り込み。

やり方が沢山あってよく分からない
いざ使うときに忘れがち

この記事を読めば基本的な絞り込み方法をマスターできます。

この記事で分かること

データフレームの絞り込み方法が分かる

この記事のサンプルコード

あわせて読みたい

【データの前処理・可視化】Pythonでのデータ処理まとめ pythonでのデータ処理についてまとめました。サンプルコード類 pandasを使った前処理よく使う基本的な使い方 pythonのライブラリ「pandas」を使って、様々なcsvやExce…

目次

ライブラリのインポート・データ準備

import pandas as pd
import seaborn as sns

必要なライブラリをインストールし

fmri = sns.load_dataset("fmri")
fmri.head()

	subject	timepoint	event	region	signal
0	s13	18	stim	parietal	-0.017552
1	s5	14	stim	parietal	-0.080883
2	s12	18	stim	parietal	-0.081033
3	s11	18	stim	parietal	-0.046134
4	s10	18	stim	parietal	-0.037970

seabornからデータをロードします。

print(fmri.value_counts('event'))
print("-----")
print(fmri.value_counts('region'))

event
stim    532
cue     532
dtype: int64
-----
region
parietal    532
frontal     532
dtype: int64

event, regionのカラムは2種類の項目を持ちます。

df[df[column] == x]

抽出の条件は、df[ ]内の値がTrueもしくはFalseかどうかで判別しています。
なので条件式をそのように組んでいきます。

基本形

DataFrame[ DataFrame[' column '] == x ]
データフレーム内で演算式を組みます。

fmri_cue = fmri[fmri['event'] == "cue"]
fmri_cue.head()

	subject	timepoint	event	region	signal
532	s3	4	cue	parietal	0.058219
533	s6	5	cue	parietal	0.038145
534	s7	5	cue	parietal	-0.008158
535	s8	5	cue	parietal	0.047136
536	s9	5	cue	parietal	0.055847

eventが"cue"の値を抽出できました。
value_counts()で確認してみます。

>>> fmri_cue.value_counts('event')

event
cue    532
dtype: int64

しっかり抽出できています。

複数項目の絞り込み

複数項目での抽出の場合は「&・|・~」でつないで処理し
クエリを( )で囲います。

fmri_cue2 = fmri[(fmri['event'] == "cue") & (fmri['region'] != 'parietal')]
fmri_cue2.head()

	subject	timepoint	event	region	signal
566	s4	14	cue	frontal	-0.026796
579	s5	14	cue	frontal	-0.017213
596	s9	0	cue	frontal	-0.008117
597	s9	6	cue	frontal	0.026864
598	s3	14	cue	frontal	-0.030614

eventを"cue"、regionを"parietal"ではないもので抽出できました。

注意点：and, or, notは使用できません。代わりに&, |, ~を使います。

fmri_subject = fmri[(fmri['subject'] == "s4") | (fmri['subject'] == "s5")]
fmri_subject.head()

	subject	timepoint	event	region	signal
1	s5	14	stim	parietal	-0.080883
9	s5	18	stim	parietal	-0.040557
10	s4	18	stim	parietal	-0.048812
23	s5	17	stim	parietal	-0.056682
24	s4	17	stim	parietal	-0.044582

subjectのs4もしくはs5で抽出できています。
しかし記載が面倒ですね・・・。これは後述するisin( )関数で簡単に記述できます。

in演算子の使い方

isin( )関数に値をリストで渡すことで使えます。

fmri_subject2 = fmri[fmri['subject'].isin(["s4","s5"])]
fmri_subject2.head()

	subject	timepoint	event	region	signal
1	s5	14	stim	parietal	-0.080883
9	s5	18	stim	parietal	-0.040557
10	s4	18	stim	parietal	-0.048812
23	s5	17	stim	parietal	-0.056682
24	s4	17	stim	parietal	-0.044582

先ほどと同じ結果が得られました。
より直感的に記述する場合はdf.query( )関数を使ってみます。

df.query()

df.query( )でカラムと値を指定します。

基本形

fmri_stim = fmri.query('event == "stim"')
fmri_stim.head()

	subject	timepoint	event	region	signal
0	s13	18	stim	parietal	-0.017552
1	s5	14	stim	parietal	-0.080883
2	s12	18	stim	parietal	-0.081033
3	s11	18	stim	parietal	-0.046134
4	s10	18	stim	parietal	-0.037970

df.query('column == "str" ')の形で抽出します。
また、時系列データの場合も同様でdf.query("column > '2020-12-31'")のようにします。

複数項目の絞り込み

複数項目で絞り込む場合には、and or notもしくは&, | , ~で使用可能です。

fmri_stim2 = fmri.query('event == "stim" & region != "parietal"')
fmri_stim2.head()

	subject	timepoint	event	region	signal
67	s0	0	stim	frontal	-0.021452
170	s2	6	stim	frontal	0.101050
267	s10	4	stim	frontal	0.030044
268	s11	4	stim	frontal	0.075957
269	s3	0	stim	frontal	0.011056

in演算子の使い方

in, not inにて行います。

fmri_subject3 = fmri.query('subject not in ["s13", "s12"]')
fmri_subject3.head()

	subject	timepoint	event	region	signal
1	s5	14	stim	parietal	-0.080883
3	s11	18	stim	parietal	-0.046134
4	s10	18	stim	parietal	-0.037970
5	s9	18	stim	parietal	-0.103513
6	s8	18	stim	parietal	-0.064408

欠損値の絞り込み

欠損値の確認をする場合には、df.query('column == column')とします。
欠損値(NaN)があるデータフレームを作成します。

from pandas import DataFrame
import numpy as np
df = DataFrame({
    'A':["alpha","beta", np.nan],
    'B':["ABC",np.nan,"GHI"]})
df

	A	B
0	alpha	ABC
1	beta	NaN
2	NaN	GHI

A,Bともに欠損値があるので、Bのカラムに欠損値が無いものを絞り込みます。

df_new = df.query('B == B')
df_new

	A	B
0	alpha	ABC
2	NaN	GHI

変数での指定

df.query( )では@変数で文字列内に変数が使えます。

subject_list = ["s13", "s12", "s11"]
event_value = "stim"
fmri_subject4 = fmri.query('subject in @subject_list and event != @event_value ')
fmri_subject4.head()

	subject	timepoint	event	region	signal
540	s12	5	cue	parietal	0.047577
551	s13	4	cue	parietal	0.053692
552	s12	4	cue	parietal	0.058198
553	s11	4	cue	parietal	0.008013
561	s11	2	cue	parietal	-0.054846

これは使い勝手が良さそうです！

文字列での絞り込み

文字列の完全一致ではなく部分一致の場合は下記関数を用います。

関数	説明
str.startswith( )	最初の文字列
str.endswith( )	末尾の文字列
str.contains( )	文字列を含む
str.match( )	正規表現にマッチ

subjectカラムの先頭”s1″で始まる行

fmri_start = fmri.query('subject.str.startswith("s1")', engine='python')
fmri_start.head(3)

	subject	timepoint	event	region	signal
0	s13	18	stim	parietal	-0.017552
2	s12	18	stim	parietal	-0.081033
3	s11	18	stim	parietal	-0.046134

subjectカラムの”0″で終わる行

fmri_end = fmri.query('subject.str.endswith("0")', engine='python')
fmri_end.head(3)

	subject	timepoint	event	region	signal
4	s10	18	stim	parietal	-0.037970
14	s0	18	stim	parietal	-0.075570
18	s10	17	stim	parietal	-0.016847

subjectカラムの”2″を含む行

fmri_ct = fmri.query('subject.str.contains("2")', engine='python')
fmri_ct.head(3)

	subject	timepoint	event	region	signal
2	s12	18	stim	parietal	-0.081033
12	s2	18	stim	parietal	-0.086623
16	s12	17	stim	parietal	-0.088512

オブジェクトの上書き

変数を入れずに上書きする場合には引数にinplace=Trueとします。

fmri.query('event == "stim"', inplace=True)
fmri.head()

	subject	timepoint	event	region	signal
0	s13	18	stim	parietal	-0.017552
1	s5	14	stim	parietal	-0.080883
2	s12	18	stim	parietal	-0.081033
3	s11	18	stim	parietal	-0.046134
4	s10	18	stim	parietal	-0.037970

参考

この記事のサンプルコード

pandas.DataFrame.query — pandas 1.2.0 documentation

機械学習・データ処理を学ぶのにおすすめの教材

じっくり書籍で学習するなら！

Python実践データ分析100本ノック

¥2,499 （2025/06/08 03:50時点 | Amazon調べ）

Yahooショッピング

ポチップ

Python 実践データ加工/可視化 100本ノック

¥2,870 （2025/06/08 03:50時点 | Amazon調べ）

Yahooショッピング

ポチップ

Kaggleで勝つデータ分析の技術

¥3,608 （2025/06/09 17:29時点 | Amazon調べ）

Yahooショッピング

ポチップ

Pythonデータ分析／機械学習のための基本コーディング！ pandasライブラリ活用入門 impress top gearシリーズ

¥2,090 （2022/02/20 08:45時点 | Amazon調べ）

Yahooショッピング

ポチップ

URLをコピーしました！

コメント

コメントするコメントをキャンセル