List Comprehensions in dataframe with multiple conditions

Question

Let us assume, I have data in dataframe which contains 3 variables as follows

        t                   a             b        
2012-03-01 00:00:00        100             2
2012-03-01 00:00:10        200            5.6
2012-03-01 00:00:20        500            6.2
2012-03-01 00:00:30        600            7.0
2012-03-01 00:00:40         50            3.2
2012-03-01 00:00:50         70           12.12
2012-03-01 01:00:00       1000           15.14
2012-03-01 00:00:10       2100            4.9

What I would like to do is filter the datasets if 100 < a < 2000 and b > 5 and save those values to new dataframe. So at the end hoping to get new dataframe table like this:

       t                    a              b               
2012-03-01 00:00:10        200            5.6
2012-03-01 00:00:20        500            6.2
2012-03-01 00:00:30        600            7.0
2012-03-01 01:00:00       1000           15.14

Can somebody help me on this problem where more conditional statement can be applied for problem such as mine in list comprehension?

I think you can use the answers to this questions: https://stackoverflow.com/questions/22591174/pandas-multiple-conditions-while-indexing-data-frame-unexpected-behavior — ThomaS, Jul 15 '21 at 08:26
Now `a` and `b` are clear and the conditions. But showing what you tried and where you got stuck is nevertheless useful. — Wolf, Jul 15 '21 at 08:35
I thought this makes reader confused hence removed it. would be glad if you can help me. :) — Ankita, Jul 15 '21 at 08:37
another point: are you maybe talking of [*list comprehension*](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions) instead of *list compression*? — Wolf, Jul 15 '21 at 08:37
Basically, I am looking a way how this can be solved by list comprehension. — Ankita, Jul 15 '21 at 08:50

Syed Hussain · Answer 1 · 2021-07-15T10:07:53.490

3

you mean something like this :

newdf = df[(100 < df.a) & (df.a < 2000) & (df.b > 5)]

you can check out this link for more ways to do this.

By using List Comprehension I guess you mean something like this:

newdf = pd.DataFrame((row[0],row[1],row[2]) for row in df.itertuples() if (100<int(row[1])<2000 and int(row[2])>5 ))

Changes suggested by Wolf

edited Jul 15 '21 at 10:07

answered Jul 15 '21 at 08:31

Syed Hussain

178
8

I further clarified my question, please do have a look at it. – Ankita Jul 15 '21 at 08:33
@AnkitaDebnath I have updated my answer. Please let me know if this is what you are looking for – Syed Hussain Jul 15 '21 at 08:37
Why are you using the bitwise and operator here? – Wolf Jul 15 '21 at 08:43
2

I would add .copy() to the end of this to ensure that any changes made to the new dataframe don't change the original, and avoid the SettingWithCopyWarning – Emi OB Jul 15 '21 at 08:43
This codes is not working when I tested with my datasets. Getting error 'The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().' – Ankita Jul 15 '21 at 08:44
1

@Wolf check out MSeifert answer to understand this https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o – Syed Hussain Jul 15 '21 at 08:48
@AnkitaDebnath try using newdf = df.loc[(100 5)] – Syed Hussain Jul 15 '21 at 08:50
@SyedHussain thanks for clarification, seems asker turned `&` into `and`? – Wolf Jul 15 '21 at 08:52
2

@AnkitaDebnath just guessing: maybe `(100 < df.a) & (df.a < 2000) & (df.b > 5)` could work? – Wolf Jul 15 '21 at 08:55
@Wolf No, the asker turned `or` into `|` and `and` into `&` – Syed Hussain Jul 15 '21 at 08:57
1

@Wolf yes this work. Thanks. But can we do same using List Comprehensions. – Ankita Jul 15 '21 at 08:58
1

@AnkitaDebnath from my limited knowledge, List Comprehension is used to get lists as output, in your use-case you want a Dataframe as a final output so I don't think list comprehension would be feasible solution. – Syed Hussain Jul 15 '21 at 09:05
@AnkitaDebnath I have updated my answer to use list comprehension, please check it and tell if that is what you are looking for – Syed Hussain Jul 15 '21 at 09:38

Wolf · Accepted Answer · 2021-07-15T10:48:05.620

After seeing the initial approach in of Syed Hussain (rev #2 of answer), your feedback and the enlightening answer to another question by Nipun, I'm sure that a filtered dataframe has to be retrieved like this:

filtered_df = df[(100 < df.a) & (df.a < 2000) & (df.b > 5)].copy()

That is, enclose every single of your original filter conditions on fields a and b into parentheses and combine them with the binary and operator (&). This way 100 < df.a < 2000 has to be split into 100 < df.a and df.a < 2000 (which is, BTW, exactly the way Python interprets chained comparisons).

I think that list comprehension would not be the right tool here, because (as can be read in the first revision of your question) you are interested in a result in form of another dataframe. So filtering a given dataframe would be the most efficient way to do it.

The copy() at the end is done for preserving your original data, please see why should I make a copy of a data frame in pandas for further details.

score 0 · Answer 3 · answered Jul 15 '21 at 08:38

0

df[(100 < df.a < 2000) & (df.b > 5)]

answered Jul 15 '21 at 08:38

Oyono

377
1
8

List Comprehensions in dataframe with multiple conditions

3 Answers3