8

I have a panda DataFrame with date_time/voltage data like this (df1):

        Date_Time  Chan
0   20130401 9:00   AAT
1  20130401 10:00   AAT
2  20130401 11:00   AAT
3  20130401 12:00   AAT
4  20130401 13:00   AAT
5  20130401 14:00   AAT
6  20130401 15:00   AAT

I am using this as a prototype to load in data from a much bigger data file and create one DataFrame . The other DataFrame looks like this (df2):

Chan          date_time  Sens1  Sens2 
 AAC  01-Apr-2013 09:00   5.17   1281
 AAC  01-Apr-2013 10:00   5.01    500
 AAC  01-Apr-2013 12:00   5.17    100
 AAC  01-Apr-2013 13:00   5.19  41997
 AAC  01-Apr-2013 16:00   5.21   2123
 AAT  01-Apr-2013 09:00  28.82    300
 AAT  01-Apr-2013 10:00  28.35   4900
 AAT  01-Apr-2013 12:00  28.04    250
 AAE  01-Apr-2013 11:00   3.36    400
 AAE  01-Apr-2013 12:00   3.41    200
 AAE  01-Apr-2013 13:00   3.40   2388
 AAE  01-Apr-2013 14:00   3.37    300
 AAE  01-Apr-2013 15:00   3.35    500
 AXN  01-Apr-2013 09:00  23.96   6643
 AXN  01-Apr-2013 10:00  24.03   1000
 AXW  01-Apr-2013 11:00  46.44   2343

So what I want to do is search df2 for all instances of a match from both columns of df1 (noting the different data formats) and insert the data from df2 into df1. Like this (df1)

         Date_Time  Chan  Sens1  Sens2 
 0   20130401 9:00   AAT  28.82    300
 1  20130401 10:00   AAT  28.35   4900
 2  20130401 11:00   AAT    NaN    NaN
 3  20130401 12:00   AAT  28.04    250
 4  20130401 13:00   AAT    NaN    NaN
 5  20130401 14:00   AAT    NaN    NaN
 6  20130401 15:00   AAT    NaN    NaN

Could you give me some suggestions for the python/pandas code to match this psuedocode:

if (df1['date_time'] = df2['date_time']) & (df1['Chan'] = df2['Chan'])): 
    df1['Sens1'] = df2['Sens1']
    df1['Sens2'] = df2['Sens2']

If it effects the answer, it is my intention to bfill and ffill the NaNs and then add this DataFrame to a Panel and then repeat with another channel name in place of AAT.

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
knight2270
  • 91
  • 1
  • 2

1 Answers1

9

You can use a plain ol' merge to do this. But first, you should do a little cleanup of you DataFrames, to make sure your datetime columns are actually datetimes rather than strings (Note: it may be better to do this when reading as csv or whatever):

df1['Date_Time'] = pd.to_datetime(df1['Date_Time'], format='%Y%m%d %H:%M')
df2['date_time'] = pd.to_datetime(df2['date_time'])

Let's also rename the Datetime columns with the same name:

df1.rename(columns={'Date_Time': 'Datetime'}, inplace=True)
df2.rename(columns={'date_time': 'Datetime'}, inplace=True)

Now a simple merge will give you what you're after:

In [11]: df1.merge(df2)
Out[11]: 
             Datetime Chan  Sens1  Sens2
0 2013-04-01 09:00:00  AAT  28.82    300
1 2013-04-01 10:00:00  AAT  28.35   4900
2 2013-04-01 12:00:00  AAT  28.04    250

In [12]: df1.merge(df2, how='left')
Out[12]: 
             Datetime Chan  Sens1  Sens2
0 2013-04-01 09:00:00  AAT  28.82    300
1 2013-04-01 10:00:00  AAT  28.35   4900
2 2013-04-01 11:00:00  AAT    NaN    NaN
3 2013-04-01 12:00:00  AAT  28.04    250
4 2013-04-01 13:00:00  AAT    NaN    NaN
5 2013-04-01 14:00:00  AAT    NaN    NaN
6 2013-04-01 15:00:00  AAT    NaN    NaN
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • I can see how that would work, but I am getting an error: "TypeError: to_datetime() got an unexpected keyword argument 'format'" in the pd.to_datetime line. Is this a function that added format in a later version of pandas. I have the version of pandas that was bundled with Pyzo (which I thought was 0.12). – knight2270 Sep 27 '13 at 02:53
  • According to this "pd.__version__" I have version 0.10.1. Let me fix that and return to this issue, as I have read something about format changes between 10 and 12. – knight2270 Sep 27 '13 at 03:21
  • OK So I have serached high and low. I believe the Pyzo 2013b package bundles pandas 0.10.0. The latest is 0.12.0 I cannot get the format method ', format='%Y%m%d %H:%M'' – knight2270 Sep 30 '13 at 06:44
  • @knight2270 the alternative is to use df1['Date_Time'].apply(lambda t: pd.Timestamp(t, format=...)) – Andy Hayden Oct 01 '13 at 00:11