1

I'm using this to conditionally select rows of column:

X.loc[data['column'] == 1]

But I want to expand this condition to several columns. These columns have something in common: They contain a same string. So actually I have a column1, a column2, ... , column100 etc. and this condition should apply to all of these columns. Actually something like this (wildcard):

X.loc[data['column*'] == 1]

These conditions should be linked with OR. Any chance to do this easily?

ScientiaEtVeritas
  • 5,158
  • 4
  • 41
  • 59
  • Have a look at`numpy.any()` Two questions I asked very similar to yours: http://stackoverflow.com/questions/43677505/np-where-multiple-logical-statements-pandas and http://stackoverflow.com/questions/42647710/compare-boolean-row-values-across-multiple-columns-in-pandas-using-np-where – Chuck May 17 '17 at 08:30
  • For a list of your columns you want to test against `col_list`, Try `X.loc[(X[col_list] == 1).any(axis=1)]` – Chuck May 17 '17 at 08:35
  • @chuckm the `.loc` is not needed – Maarten Fabré May 17 '17 at 08:42
  • @MaartenFabré Thanks for the info, updated and credited :) – Chuck May 17 '17 at 08:46

4 Answers4

7

For some dataframe X

   p A  p B  p C
0    0    0    0
1    0    0    0
2    0    0    1
3    0    0    0
4    0    0    0
5    0    0    0
6    1    0    0

If you can set up the names of the columns you want to test for in col_list

col_list = X.columns

You can then use np.any() to test with or between each:

X.loc[(X[col_list] == 1).any(axis=1)]

Which gives you:

   p A  p B  p C
2    0    0    1
6    1    0    0

Informed you don't need loc and will still get the same answer, credit to @MaartynFabre for the info

X[(X[col_list] == 1).any(axis=1)]

   p A  p B  p C
2    0    0    1
6    1    0    0
Chuck
  • 3,664
  • 7
  • 42
  • 76
3

test Dataframe

    col0 col1 col2
0   1    1    2
1   1    1    1
2   2    2    2

make a new dataframe with the test for all columns

result_s = d.concat((df['col%i'%i] == 1 for i in range(3)), axis=1).all(axis=1)

results in

0    False
1     True
2    False
dtype: bool

if you do df[result_s] you get

    col0 col1 col2
1   1    1    1

this selects the rows where all columns are ==1 If one of the is enough, change the .all() to .any

    col0 col1 col2
0   1    1    2
1   1    1    1
Greg Sadetsky
  • 4,863
  • 1
  • 38
  • 48
Maarten Fabré
  • 6,938
  • 1
  • 17
  • 36
  • Looks nice :) But this is not working generally, e.g. if these columns are named as ``columnOne``, ``columnTwo`` etc. – ScientiaEtVeritas May 17 '17 at 08:39
  • if you have a list of the columns, `df[(df[selected_columns]==1).any(axis=1)]` can work too and be more efficient – Maarten Fabré May 17 '17 at 08:39
  • @scientiaetveritas `df['col%i'%i] == 1 for i in range(3)` is just a method to select the columns, depending on how they look this might need modification. You will need to provide more information in your question to make this clear – Maarten Fabré May 17 '17 at 08:40
1

Put each comparison in brackets and combine them with logical operators:

pd.DataFrame(X).loc[(data['col1']==23) & (data['col2']==42)] # and
pd.DataFrame(X).loc[(data['col1']==23) | (data['col2']==42)] # or
rhombuzz
  • 97
  • 10
0

Here's another way to consider:

df
   col0  col1  col2
0     1     1     2
1     1     1     1
2     2     2     2

df.loc[df['col0'] == 1, [x for x in df.columns if x == 'col0']]
   col0
0     1
1     1

You can use list comprehension to find the columns you're looking for.

Andrew L
  • 6,618
  • 3
  • 26
  • 30