1

a DataFrame name is df_y I want to find some data('K','m') on many columns so i made this code

df_y[df_y[['column1', 'column2', 'column3']].str.contains('K|M')]

then I could see a error "'DataFrame' object has no attribute 'str'"

i think the code has problem by containg many columns...

IDK how to make that correctly..!

petezurich
  • 9,280
  • 9
  • 43
  • 57
On J
  • 11
  • 3

4 Answers4

0

for indexing and selecting data use '&' and '|' operators. https://pandas.pydata.org/docs/user_guide/indexing.html

df_y[df_y['column1'].str.contains('K|M') & df_y['column2'].str.contains('K|M') & df_y['column3'].str.contains('K|M')]
Odys
  • 1
  • 3
  • is there no way to write the code at once?? if they have columns number [-1:-3]..? – On J Jun 26 '22 at 15:26
  • Unfortunately no. This is the only way to do it, to avoid any ambiguity. Try df_y['column1'].str.contains('K|M'): this will give you a pd.Series of boolean values. Pandas doesn't know if you're looking for the 'K|M' pattern in 'column1' AND 'column2' AND 'column3', or in 'column1' OR 'column2' OR 'column3', or even a mix like ('column1' AND 'column2') OR 'column3'. Therefore you are constrained to use the pandas '&' and '|' logical operators to be explicit on your index selection. – Odys Jun 27 '22 at 20:10
0

You have an inconveniently large number of columns and want to find where a rare string appears in any of those columns. Ok. A pair of text processing solutions come to mind.

1. CSV file

Serialize out to the filesystem with df.to_csv('y.csv') and then

$ egrep -n --color 'K\|M' y.csv

2. str()

Perhaps you prefer to use that approach while remaining entirely within python.

There are good reasons for people criticizing the slow speed of non-vectorized operations like .iterrows(). But if you want a quick'n'dirty solution, this should suffice:

for i in range(len(df)):
    row = str(df.iloc[i])
    if 'K|M' in row:
        print(i)
        print(df.iloc[i])
J_H
  • 17,926
  • 4
  • 24
  • 44
0

You can try df.apply

df_y[df_y[['column1', 'column2', 'column3']].apply(lambda x:x.str.contains('K|M'))]
  • Just in case, if you want to perform the filter throughout the dataframe, it will work too. –  Jun 26 '22 at 16:40
0

You can try the following code:

temp = df.copy()
num_of_columns = 2
temp.iloc[:, 1:3] = temp.iloc[:, 1:3].apply(lambda x: x.str.contains('K|M'))
index = temp[temp.iloc[:, 1:3].eq([True] * num_of_columns).all(1)].index.to_numpy()
df.iloc[index]
  • Replace num_of_columns with number of columns you want to perform the operation on
  • replace 1:3 inside the temp.iloc with the columns you want to work on