I work with dataframes with this kind of structure:
[timestamp][id][A][B][C][D]
due to a number of things, not all data is retrieved and i end up with something like
2015-05-05 15:00:00 2 NaN NaN NaN 2680
2015-05-05 15:00:00 3 0989 0020 -0011 2680
2015-05-05 15:00:00 4 1022 0060 -0076 2600
2015-05-05 15:00:00 5 NaN NaN NaN 2623
in this case, using fillna(method="ffill")
will use the 'A'
value from id=4
, to fill the 'A'
value of id=5
.
What i want to accomplish however, is to fill the 'A'
value of id=5
from its previously known value somewhere along the (700000) csv entries. (and fill all NaN values from all columns this way)
2014-07-24 17:49:00 5 1046.0 -3.0 -239.0 2800.0
...
2015-05-05 15:00:00 2 NaN NaN NaN 2680
2015-05-05 15:00:00 3 0989 0020 -0011 2680
2015-05-05 15:00:00 4 1022 0060 -0076 2600
2015-05-05 15:00:00 5 NaN NaN NaN 2623
becomes
2015-05-05 15:00:00 2 NaN NaN NaN 2680
2015-05-05 15:00:00 3 0989 0020 -0011 2680
2015-05-05 15:00:00 4 1022 0060 -0076 2600
2015-05-05 15:00:00 5 1046 -3.0 -239.0 2623
with values coming from the last known data of id=5 which is 2014-07-24 17:49:00
instead of
2015-05-05 15:00:00 2 NaN NaN NaN 2680
2015-05-05 15:00:00 3 0989 0020 -0011 2680
2015-05-05 15:00:00 4 1022 0060 -0076 2600
2015-05-05 15:00:00 5 1022 0060 -0076 2623
which is what happens when i do the fillna
my second solution was to do a groupby
'id' and then do a fillna()
. but that was way over my skill as a pandas user. so I was just wondering if there are any better solutions out there.