0

I am calculating a value for a new column and I want to save this data for each time step. My data is hourly time series data with a column of 0 or 1 indicating if precipitation is not rain or rain accordingly. I then want to create a column that tells me how many hours (each row of data is 1 hour) have passed since it last rained (sees a 1 in the rain column).

hours_since_rainfall = 0
df['hours_since_rainfall'] = []

for rain in df['rain']:
if rain == 0 : 
    hours_since_rainfall += 1
else : 
    hours_since_rainfall = 0
print(hours_since_rainfall)

The print function gives me accurate data readout. How do I save this to a new column in my dataframe? I have tried all sorts of append, concat, and I just cannot get the data frame to populate. Here were some of my attempts:

df = pd.DataFrame(hours_since_rainfall)    
df['hours_since_rainfall'] = df.append(hours_since_rainfall)    
df['hours_since_rainfall'] = hours_since_rainfall

The data I am trying to achieve:

date                  rain  hours_since_rain
2005-10-15 16:00:00   1      0
2005-10-15 17:00:00   1      0
2005-10-15 18:00:00   0      1
2005-10-15 19:00:00   0      2
2005-10-15 20:00:00   0      3
2005-10-15 21:00:00   0      4
2005-10-15 22:00:00   1      0
Ahegg
  • 19
  • 4
  • 1
    Append `hour_since_rainfall` in a list and assign that list to `df['hours_since_rainfall']`. – Amit Vikram Singh Apr 20 '21 at 00:16
  • How do I do this? – Ahegg Apr 20 '21 at 00:29
  • Do you know how to [build a list](https://realpython.com/python-append/)? – CrazyChucky Apr 20 '21 at 01:15
  • Not from a for loop. All I have been able to do is print out the list. I have been googling and reading forums on this for hours. Any help would be appreciated. It seems like it should be so simple but I can't get it to work no matter what I have tried. – Ahegg Apr 20 '21 at 01:34
  • Can you include a sample of your source dataframe (`df.head(10).to_dict()` or so) in your question? Can you also mock up a few rows of what you want your resulting dataframe to look like? – Henry Ecker Apr 20 '21 at 01:58
  • I updated my entry to show the data – Ahegg Apr 20 '21 at 03:00

1 Answers1

0

So it seems to me that you're really trying to create a Pandas: conditional rolling count.

Except when the rain value is 1 you want zeros and when the rain value is 0 you want the rolling count.

To do this, you can get the rolling total for all groups in rain, then go back through the DataFrame and set the hours_since_rainfall values to 0 where rain is 1.

import pandas as pd

df = pd.DataFrame({
    'date': {
        0: '2005-10-15 16:00:00', 1: '2005-10-15 17:00:00',
        2: '2005-10-15 18:00:00', 3: '2005-10-15 19:00:00',
        4: '2005-10-15 20:00:00', 5: '2005-10-15 21:00:00',
        6: '2005-10-15 22:00:00', 7: '2005-10-15 23:00:00',
        8: '2005-10-16 00:00:00', 9: '2005-10-16 01:00:00'
    },
    'rain': {
        0: 1, 1: 1, 2: 0, 3: 0, 4: 0, 5: 0, 6: 1, 7: 1, 8: 0, 9: 0
    }
})

df['hours_since_rainfall'] = df.groupby([(df['rain'] != df['rain'].shift()).cumsum()]) \
                                 .cumcount() + 1
df.loc[df['rain'] == 1, 'hours_since_rainfall'] = 0
print(df)

Alternatively, you can conditionally transform based on the rain value.

If the first value in the group is a 0 (all of the values in the group are 0), you can replace the series with the range from 1 to len + 1.

If the first value in the group is a 1 (all the values in the group are 1), you can replace it with [0] * len or, basically, an equal length series of zeros.

import pandas as pd

df = pd.DataFrame({
    'date': {
        0: '2005-10-15 16:00:00', 1: '2005-10-15 17:00:00',
        2: '2005-10-15 18:00:00', 3: '2005-10-15 19:00:00',
        4: '2005-10-15 20:00:00', 5: '2005-10-15 21:00:00',
        6: '2005-10-15 22:00:00', 7: '2005-10-15 23:00:00',
        8: '2005-10-16 00:00:00', 9: '2005-10-16 01:00:00'
    },
    'rain': {
        0: 1, 1: 1, 2: 0, 3: 0, 4: 0, 5: 0, 6: 1, 7: 1, 8: 0, 9: 0
    }
})

df['hours_since_rainfall'] = df \
    .groupby([(df['rain'] != df['rain'].shift()).cumsum()])['rain'] \
    .transform(lambda x: range(1, len(x) + 1) if x.iloc[0] == 0 else [0] * len(x))
print(df)

Both Produce:

                  date  rain  hours_since_rainfall
0  2005-10-15 16:00:00     1                     0
1  2005-10-15 17:00:00     1                     0
2  2005-10-15 18:00:00     0                     1
3  2005-10-15 19:00:00     0                     2
4  2005-10-15 20:00:00     0                     3
5  2005-10-15 21:00:00     0                     4
6  2005-10-15 22:00:00     1                     0
7  2005-10-15 23:00:00     1                     0
8  2005-10-16 00:00:00     0                     1
9  2005-10-16 01:00:00     0                     2

As Dict:

{'date': {0: '2005-10-15 16:00:00', 1: '2005-10-15 17:00:00',
          2: '2005-10-15 18:00:00', 3: '2005-10-15 19:00:00',
          4: '2005-10-15 20:00:00', 5: '2005-10-15 21:00:00',
          6: '2005-10-15 22:00:00', 7: '2005-10-15 23:00:00',
          8: '2005-10-16 00:00:00', 9: '2005-10-16 01:00:00'},
 'rain': {0: 1, 1: 1, 2: 0, 3: 0, 4: 0,
          5: 0, 6: 1, 7: 1, 8: 0, 9: 0},
 'hours_since_rainfall': {0: 0, 1: 0, 2: 1, 3: 2, 4: 3,
                          5: 4, 6: 0, 7: 0, 8: 1, 9: 2}}
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
  • When the count starts again it continues at 5 instead of resetting at 1. Sorry if I did not make that clear. How do I reset the count to 1? – Ahegg Apr 20 '21 at 17:25
  • You were clear. I knew the hours since rain should reset to 0 in each group of 1s, but your provided data didn't include the case where this happened, so I missed it my testing. It should work as expected now. – Henry Ecker Apr 20 '21 at 18:40