0

I try to run over my data and check if one column (rain) is bigger that 0. my code:

import pandas as pd

data = pd.read_csv('weather_forecast2.csv')
# print (data)
data_rain = pd.DataFrame()
for index, row in data.iterrows():
    if row['rain'] > 0:
        data_rain.append(row)

print (data_rain)

Example of the DataFrame data:

                  time  ghi  dni  ...  barometric_pressure  rain  sensor_cleaning
0     01/07/2018 07:14   34    0  ...                981.8   0.1                0
1     01/07/2018 07:15   34    0  ...                981.9   0.0                0
2     01/07/2018 07:16   35    0  ...                981.9   0.0                0
3     01/07/2018 07:17   36    0  ...                981.9   0.0                0
4     01/07/2018 07:18   37    0  ...                981.9   0.1                0
5     01/07/2018 07:19   38    0  ...                982.0   0.0                0
6     01/07/2018 07:20   39    0  ...                982.0   0.0                0
7     01/07/2018 07:21   40    0  ...                982.0   0.0                0
8     01/07/2018 07:22   42    0  ...                982.0   0.0                0
9     01/07/2018 07:23   43    0  ...                982.0   0.0                0
10    01/07/2018 07:24   44    0  ...                982.0   0.0                0
11    01/07/2018 07:25   45    0  ...                982.0   0.1                0
12    01/07/2018 07:26   46    0  ...                982.1   0.0                0

When I try to run my code it shows:

Empty DataFrame
Columns: []
Index: []

When you can see that the rain column has values that are different from 0. What is my mistake?

  • 3
    Use `data_rain = data[data['rain']>0]` . See: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing – ansev Jan 23 '20 at 15:41
  • 1
    You definitely don't want to filter your dataframe using `iterrows` and appending to a new dataframe. Have a look at the [docs](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html) on indexing. You could use `df.rain[df.rain.gt(0)]` – yatu Jan 23 '20 at 15:41

2 Answers2

0
data.loc[data['rain'] > 0]

Should do the trick. Also for future reference, it's considered bad practice to iterate over a Pandas dataframe to select some rows/columns, because there is almost always a more efficient way of going about it.

Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
Julian L
  • 84
  • 1
  • 3
  • 10
0

This seems to be a better way of filtering memory-wise (and I think it's prettier, too)

data_rain = data.query('rain > 0')

ansev
  • 30,322
  • 5
  • 17
  • 31
Oleg O
  • 1,005
  • 6
  • 11