how to find specific data in many columns?

Question

a DataFrame name is df_y I want to find some data('K','m') on many columns so i made this code

df_y[df_y[['column1', 'column2', 'column3']].str.contains('K|M')]

then I could see a error "'DataFrame' object has no attribute 'str'"

i think the code has problem by containg many columns...

IDK how to make that correctly..!

Provide some sample input/output data. See: https://stackoverflow.com/q/20109391/17769815 — BrokenBenchmark, Jun 26 '22 at 15:28

Odys · Answer 1 · 2022-06-26T15:17:29.033

0

for indexing and selecting data use '&' and '|' operators. https://pandas.pydata.org/docs/user_guide/indexing.html

df_y[df_y['column1'].str.contains('K|M') & df_y['column2'].str.contains('K|M') & df_y['column3'].str.contains('K|M')]

edited Jun 26 '22 at 15:17

answered Jun 26 '22 at 15:17

Odys

1
3

is there no way to write the code at once?? if they have columns number [-1:-3]..? – On J Jun 26 '22 at 15:26
Unfortunately no. This is the only way to do it, to avoid any ambiguity. Try df_y['column1'].str.contains('K|M'): this will give you a pd.Series of boolean values. Pandas doesn't know if you're looking for the 'K|M' pattern in 'column1' AND 'column2' AND 'column3', or in 'column1' OR 'column2' OR 'column3', or even a mix like ('column1' AND 'column2') OR 'column3'. Therefore you are constrained to use the pandas '&' and '|' logical operators to be explicit on your index selection. – Odys Jun 27 '22 at 20:10

J_H · Answer 2 · 2022-06-26T16:34:15.257

You have an inconveniently large number of columns and want to find where a rare string appears in any of those columns. Ok. A pair of text processing solutions come to mind.

1. CSV file

Serialize out to the filesystem with df.to_csv('y.csv') and then

$ egrep -n --color 'K\|M' y.csv

2. str()

Perhaps you prefer to use that approach while remaining entirely within python.

There are good reasons for people criticizing the slow speed of non-vectorized operations like .iterrows(). But if you want a quick'n'dirty solution, this should suffice:

for i in range(len(df)):
    row = str(df.iloc[i])
    if 'K|M' in row:
        print(i)
        print(df.iloc[i])

score 0 · Accepted Answer · answered Jun 26 '22 at 16:38

0

You can try df.apply

df_y[df_y[['column1', 'column2', 'column3']].apply(lambda x:x.str.contains('K|M'))]

answered Jun 26 '22 at 16:38

Just in case, if you want to perform the filter throughout the dataframe, it will work too. – Jun 26 '22 at 16:40

score 0 · Answer 4 · answered Jun 26 '22 at 16:52

You can try the following code:

temp = df.copy()
num_of_columns = 2
temp.iloc[:, 1:3] = temp.iloc[:, 1:3].apply(lambda x: x.str.contains('K|M'))
index = temp[temp.iloc[:, 1:3].eq([True] * num_of_columns).all(1)].index.to_numpy()
df.iloc[index]

Replace num_of_columns with number of columns you want to perform the operation on
replace 1:3 inside the temp.iloc with the columns you want to work on

how to find specific data in many columns?

4 Answers4

1. CSV file

2. str()