For and if on dataframe column value

Question

I try to run over my data and check if one column (rain) is bigger that 0. my code:

import pandas as pd

data = pd.read_csv('weather_forecast2.csv')
# print (data)
data_rain = pd.DataFrame()
for index, row in data.iterrows():
    if row['rain'] > 0:
        data_rain.append(row)

print (data_rain)

Example of the DataFrame data:

                  time  ghi  dni  ...  barometric_pressure  rain  sensor_cleaning
0     01/07/2018 07:14   34    0  ...                981.8   0.1                0
1     01/07/2018 07:15   34    0  ...                981.9   0.0                0
2     01/07/2018 07:16   35    0  ...                981.9   0.0                0
3     01/07/2018 07:17   36    0  ...                981.9   0.0                0
4     01/07/2018 07:18   37    0  ...                981.9   0.1                0
5     01/07/2018 07:19   38    0  ...                982.0   0.0                0
6     01/07/2018 07:20   39    0  ...                982.0   0.0                0
7     01/07/2018 07:21   40    0  ...                982.0   0.0                0
8     01/07/2018 07:22   42    0  ...                982.0   0.0                0
9     01/07/2018 07:23   43    0  ...                982.0   0.0                0
10    01/07/2018 07:24   44    0  ...                982.0   0.0                0
11    01/07/2018 07:25   45    0  ...                982.0   0.1                0
12    01/07/2018 07:26   46    0  ...                982.1   0.0                0

When I try to run my code it shows:

Empty DataFrame
Columns: []
Index: []

When you can see that the rain column has values that are different from 0. What is my mistake?

Use `data_rain = data[data['rain']>0]` . See: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing — ansev, Jan 23 '20 at 15:41
You definitely don't want to filter your dataframe using `iterrows` and appending to a new dataframe. Have a look at the [docs](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html) on indexing. You could use `df.rain[df.rain.gt(0)]` — yatu, Jan 23 '20 at 15:41

score 0 · Answer 1 · edited Jan 23 '20 at 15:44

0

data.loc[data['rain'] > 0]

Should do the trick. Also for future reference, it's considered bad practice to iterate over a Pandas dataframe to select some rows/columns, because there is almost always a more efficient way of going about it.

edited Jan 23 '20 at 15:44

Celius Stingher

17,835
6
23
53

answered Jan 23 '20 at 15:42

Julian L

84
1
3
10

this is duplicated question – ansev Jan 23 '20 at 15:47

score 0 · Answer 2 · edited Jan 23 '20 at 15:52

0

This seems to be a better way of filtering memory-wise (and I think it's prettier, too)

data_rain = data.query('rain > 0')

edited Jan 23 '20 at 15:52

ansev

30,322
5
17
31

answered Jan 23 '20 at 15:46

Oleg O

1,005
6
11

this is duplicate question – ansev Jan 23 '20 at 15:47

For and if on dataframe column value

2 Answers2