0

Let us assume, I have data in dataframe which contains 3 variables as follows

        t                   a             b        
2012-03-01 00:00:00        100             2
2012-03-01 00:00:10        200            5.6
2012-03-01 00:00:20        500            6.2
2012-03-01 00:00:30        600            7.0
2012-03-01 00:00:40         50            3.2
2012-03-01 00:00:50         70           12.12
2012-03-01 01:00:00       1000           15.14
2012-03-01 00:00:10       2100            4.9 

What I would like to do is filter the datasets if 100 < a < 2000 and b > 5 and save those values to new dataframe. So at the end hoping to get new dataframe table like this:

       t                    a              b               
2012-03-01 00:00:10        200            5.6
2012-03-01 00:00:20        500            6.2
2012-03-01 00:00:30        600            7.0
2012-03-01 01:00:00       1000           15.14

Can somebody help me on this problem where more conditional statement can be applied for problem such as mine in list comprehension?

Wolf
  • 9,679
  • 7
  • 62
  • 108
Ankita
  • 485
  • 5
  • 18
  • I think you can use the answers to this questions: https://stackoverflow.com/questions/22591174/pandas-multiple-conditions-while-indexing-data-frame-unexpected-behavior – ThomaS Jul 15 '21 at 08:26
  • Now `a` and `b` are clear and the conditions. But showing what you tried and where you got stuck is nevertheless useful. – Wolf Jul 15 '21 at 08:35
  • I thought this makes reader confused hence removed it. would be glad if you can help me. :) – Ankita Jul 15 '21 at 08:37
  • 1
    another point: are you maybe talking of [*list comprehension*](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions) instead of *list compression*? – Wolf Jul 15 '21 at 08:37
  • Yes you are right. – Ankita Jul 15 '21 at 08:46
  • Basically, I am looking a way how this can be solved by list comprehension. – Ankita Jul 15 '21 at 08:50

3 Answers3

3

you mean something like this :

newdf = df[(100 < df.a) & (df.a < 2000) & (df.b > 5)]

you can check out this link for more ways to do this.

By using List Comprehension I guess you mean something like this:

newdf = pd.DataFrame((row[0],row[1],row[2]) for row in df.itertuples() if (100<int(row[1])<2000 and int(row[2])>5 ))

Changes suggested by Wolf

Syed Hussain
  • 178
  • 8
  • I further clarified my question, please do have a look at it. – Ankita Jul 15 '21 at 08:33
  • @AnkitaDebnath I have updated my answer. Please let me know if this is what you are looking for – Syed Hussain Jul 15 '21 at 08:37
  • Why are you using the bitwise and operator here? – Wolf Jul 15 '21 at 08:43
  • 2
    I would add .copy() to the end of this to ensure that any changes made to the new dataframe don't change the original, and avoid the SettingWithCopyWarning – Emi OB Jul 15 '21 at 08:43
  • This codes is not working when I tested with my datasets. Getting error 'The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().' – Ankita Jul 15 '21 at 08:44
  • 1
    @Wolf check out MSeifert answer to understand this https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o – Syed Hussain Jul 15 '21 at 08:48
  • @AnkitaDebnath try using newdf = df.loc[(100 5)] – Syed Hussain Jul 15 '21 at 08:50
  • @SyedHussain thanks for clarification, seems asker turned `&` into `and`? – Wolf Jul 15 '21 at 08:52
  • 2
    @AnkitaDebnath just guessing: maybe `(100 < df.a) & (df.a < 2000) & (df.b > 5)` could work? – Wolf Jul 15 '21 at 08:55
  • @Wolf No, the asker turned `or` into `|` and `and` into `&` – Syed Hussain Jul 15 '21 at 08:57
  • 1
    @Wolf yes this work. Thanks. But can we do same using List Comprehensions. – Ankita Jul 15 '21 at 08:58
  • 1
    @AnkitaDebnath from my limited knowledge, List Comprehension is used to get lists as output, in your use-case you want a Dataframe as a final output so I don't think list comprehension would be feasible solution. – Syed Hussain Jul 15 '21 at 09:05
  • @AnkitaDebnath I have updated my answer to use list comprehension, please check it and tell if that is what you are looking for – Syed Hussain Jul 15 '21 at 09:38
2

After seeing the initial approach in of Syed Hussain (rev #2 of answer), your feedback and the enlightening answer to another question by Nipun, I'm sure that a filtered dataframe has to be retrieved like this:

filtered_df = df[(100 < df.a) & (df.a < 2000) & (df.b > 5)].copy()

That is, enclose every single of your original filter conditions on fields a and b into parentheses and combine them with the binary and operator (&). This way 100 < df.a < 2000 has to be split into 100 < df.a and df.a < 2000 (which is, BTW, exactly the way Python interprets chained comparisons).

I think that list comprehension would not be the right tool here, because (as can be read in the first revision of your question) you are interested in a result in form of another dataframe. So filtering a given dataframe would be the most efficient way to do it.

The copy() at the end is done for preserving your original data, please see why should I make a copy of a data frame in pandas for further details.

Wolf
  • 9,679
  • 7
  • 62
  • 108
0
df[(100 < df.a < 2000) & (df.b > 5)]
Oyono
  • 377
  • 1
  • 8