Ok, I have a (big) dataframe, something like this:
date time value
0 20100201 0 1
1 20100201 6 2
2 20100201 12 3
3 20100201 18 4
4 20100202 0 5
5 20100202 6 6
6 20100202 12 7
7 20100202 18 8
8 20100203 0 9
9 20100203 18 11
10 20100204 6 12
...
8845 20160101 18 8846
As you can see, the dataframe has a column date
, a column time
with four hours for each day (00, 06, 12, 18) and a column value
.
The problem is that there are missing dates in the dataframe, in the example above there should be two extra rows between rows 8 and 9, corresponding to the hours 6
and 12
of the day 20100203
, and also an extra row between rows 9 and 10 corresponding to the hour 0
of the day 20100204
.
What would I need? I would like to iterate the date
column of the dataframe, checking that every day exists and no one is missing, and also that for every day there are the four hours (00, 06, 12, 18). In case that something is missing during the iteration there should be added in exactly that place, with the missing date
and time
and NaN
as a value. In order to not copy all the dataframe again, let me put the relevant aspects that there should appear in a final version:
...
7 20100202 18 8
8 20100203 0 9
9 20100203 6 NaN
10 20100203 12 NaN
11 20100203 18 11
12 20100204 0 NaN
13 20100204 6 12
...
In case you are interested, an easier version of this problem was asked here Modular arithmetic in python to iterate a pandas dataframe and kindly answered by users @Alexander and @piRSquared. The version asked here is a more difficult one, involving (I suppose) the use of datetime and timedelta and iterating more columns.
Sorry for the long post and thank you very much.