1

I got a big dataset in pandas with a datetime index. The dataframe start at 2010-04-09 and ends present time. When I created this dataset I had for a few columns which had only data starting from 2011-06-01. The values above that where NaN values. Now I managed to get the data for a few of those columns between 2010-04-09 and 2011-06-01. Those data are in a different dataframe with the same datetime index. Now I want to fill the old columns in the original dataset with the values of the new one but I seem not to be able to do it.

My original dataframe looks like this:

>>> data.head()
                       bc_conc  stability  wind_speed  Qnet  Visibility  \
2010-04-09 10:00:00  609.542000        NaN         NaN   NaN         NaN   
2010-04-09 11:00:00  663.500000        NaN         NaN   NaN         NaN   
2010-04-09 12:00:00  524.661667        NaN         NaN   NaN         NaN   
2010-04-09 13:00:00  228.706667        NaN         NaN   NaN         NaN   
2010-04-09 14:00:00  279.721667        NaN         NaN   NaN         NaN   

                      wind_direction  Rain seizoen clouds  
2010-04-09 10:00:00             NaN   NaN   lente      1  
2010-04-09 11:00:00             NaN   NaN   lente      6  
2010-04-09 12:00:00             NaN   NaN   lente      8  
2010-04-09 13:00:00             NaN   NaN   lente      4  
2010-04-09 14:00:00             NaN   NaN   lente      7  

The dataframe I want to add looks like this:

>>> df.loc['20100409']
                  Qnet  Rain  Windspeed  Winddirection

2010-04-09 10:00:00  326.3   0.0        2.4            288
2010-04-09 11:00:00  331.8   0.0        3.6            308
2010-04-09 12:00:00  212.7   0.0        3.8            349
2010-04-09 13:00:00  246.6   0.0        4.1            354
2010-04-09 14:00:00  422.7   0.0        4.5            343
2010-04-09 15:00:00  210.9   0.0        4.6            356
2010-04-09 16:00:00  120.6   0.0        4.5              3
2010-04-09 17:00:00   83.3   0.0        4.5              4
2010-04-09 18:00:00  -23.8   0.0        3.3              7
2010-04-09 19:00:00  -54.0   0.0        3.0             15
2010-04-09 20:00:00  -44.3   0.0        2.7              3
2010-04-09 21:00:00  -41.9   0.0        2.6              3
2010-04-09 22:00:00  -42.1   0.0        2.2              1
2010-04-09 23:00:00  -47.4   0.0        2.2              2

So I want to add the values of df['Qnet'] to data['Qnet'], etc

I tried a lot of things with merge and join but nothing seems to really work. There are no overlapping data in the frames. The 'df' dataframe stops at 2011-05-31 and the 'data' dataframe has NaN values until that date in the columns I want to change. The original columns in data do have values from 2011-06-01 and I want to keep those!

I do know how to merge the two dataset but then I get a Qnet_x and a Qnet_y column.

So the questions is how do I combine/merge two columns in 2 or the same dataset.

I hope the question is clear

Thanks in advance for the help

joanne
  • 79
  • 5

1 Answers1

0

UPDATE2:

this version should also work with duplicates in the index:

data = data.join(df['Qnet'], rsuffix='_new')
data['Qnet'] = data['Qnet'].combine_first(data['Qnet_new'])
data.drop(['Qnet_new'], axis=1, inplace=True)

UPDATE:

data.ix[pd.isnull(data.Qnet), 'Qnet'] = df['Qnet']

In [114]: data.loc[data.index[-1], 'Qnet'] = 9999

In [115]: data
Out[115]:
                        bc_conc  stability  wind_speed    Qnet  Visibility  \
2010-04-09 10:00:00  609.542000        NaN         NaN     NaN         NaN
2010-04-09 11:00:00  663.500000        NaN         NaN     NaN         NaN
2010-04-09 12:00:00  524.661667        NaN         NaN     NaN         NaN
2010-04-09 13:00:00  228.706667        NaN         NaN     NaN         NaN
2010-04-09 14:00:00  279.721667        NaN         NaN  9999.0         NaN

                     wind_direction  Rain seizoen  clouds
2010-04-09 10:00:00             NaN   NaN   lente       1
2010-04-09 11:00:00             NaN   NaN   lente       6
2010-04-09 12:00:00             NaN   NaN   lente       8
2010-04-09 13:00:00             NaN   NaN   lente       4
2010-04-09 14:00:00             NaN   NaN   lente       7

In [116]: data.ix[pd.isnull(data.Qnet), 'Qnet'] = df['Qnet']

In [117]: data
Out[117]:
                        bc_conc  stability  wind_speed    Qnet  Visibility  \
2010-04-09 10:00:00  609.542000        NaN         NaN   326.3         NaN
2010-04-09 11:00:00  663.500000        NaN         NaN   331.8         NaN
2010-04-09 12:00:00  524.661667        NaN         NaN   212.7         NaN
2010-04-09 13:00:00  228.706667        NaN         NaN   246.6         NaN
2010-04-09 14:00:00  279.721667        NaN         NaN  9999.0         NaN

                     wind_direction  Rain seizoen  clouds
2010-04-09 10:00:00             NaN   NaN   lente       1
2010-04-09 11:00:00             NaN   NaN   lente       6
2010-04-09 12:00:00             NaN   NaN   lente       8
2010-04-09 13:00:00             NaN   NaN   lente       4
2010-04-09 14:00:00             NaN   NaN   lente       7

OLD answer:

you can do it this way:

In [97]: data.drop(['Qnet'], axis=1).join(df['Qnet'])
Out[97]:
                        bc_conc  stability  wind_speed  Visibility  \
2010-04-09 10:00:00  609.542000        NaN         NaN         NaN
2010-04-09 11:00:00  663.500000        NaN         NaN         NaN
2010-04-09 12:00:00  524.661667        NaN         NaN         NaN
2010-04-09 13:00:00  228.706667        NaN         NaN         NaN
2010-04-09 14:00:00  279.721667        NaN         NaN         NaN

                     wind_direction  Rain seizoen  clouds   Qnet
2010-04-09 10:00:00             NaN   NaN   lente       1  326.3
2010-04-09 11:00:00             NaN   NaN   lente       6  331.8
2010-04-09 12:00:00             NaN   NaN   lente       8  212.7
2010-04-09 13:00:00             NaN   NaN   lente       4  246.6
2010-04-09 14:00:00             NaN   NaN   lente       7  422.7
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419