2

struggling to get to grips with update/merge/concat etc to achieve what I thought was a simple merge or update procedure. I have df (recorded values) and df2 (times series with 0s), and would like to update/replace the values in df2 with the corresponding recorded values of df, matching on date_time.

Example: df=

Date_Time   Perc
03/03/2010 00:05    1.0
03/03/2010 00:15    2.0

df2 =

  Date_Time Perc
03/03/2010 00:00    0.0
03/03/2010 00:05    0.0
03/03/2010 00:10    0.0
03/03/2010 00:15    0.0
03/03/2010 00:20    0.0

Where the result would return:

Date_Time   Perc
03/03/2010 00:00    0.0
03/03/2010 00:05    1.0
03/03/2010 00:10    0.0
03/03/2010 00:15    2.0
03/03/2010 00:20    0.0

I'm finding this frustrating as there is good info and examples on http://pandas.pydata.org/pandas-docs/stable/merging.html and a great SO question with multiple solutions on How to update values in a specific row in a Python Pandas DataFrame? but having tried several approaches, none work.

Tried so far: Re-indexing to Date_time and df2.update(df) with variations, multiple merging/join/concat variations, an adapted definition (below) with apply...and now wondering if I need to use iterrows (see below?). Any suggestions on pointers in the right direction would be appreciated...maybe I'm missing something fundamental in my approach...

def update_vals(row, data=data):
    if row.Date_Time == df.Date_Time:
        row.Perc = df.Perc
    return row

for index, row in df2['Date_Time'].iterrows():
    x = df2['Date_Time']
    for index, row in df['Date_Time'].iterrows():
        x2 = df['Date_Time']
        if x2 ==x:
            df2['Perc'] = df['Perc']

Thought this would work as well, but it results in (ValueError: cannot reindex from a duplicate axis)

df.set_index('Date_Time', inplace=True)
df2.set_index('Date_Time', inplace=True)
df2.update(df[['Perc']])
Community
  • 1
  • 1
Clint
  • 347
  • 1
  • 4
  • 10
  • `df2.update(df)` works for me. – hellpanderr Nov 10 '15 at 07:38
  • @hellpanderrr that's strange, when I run it my output looses the full df2 (see below 00:00 time stamp is gone) I'm tying to preserve the complete df2, and only change the matching times values of 'perc' : Date_Time Perc 0 03/03/2010 00:05 1.0 1 03/03/2010 00:15 2.0 2 03/03/2010 00:10 0.0 3 03/03/2010 00:15 0.0 – Clint Nov 10 '15 at 08:14

1 Answers1

2

df2

                  Perc
Date_Time   
03/03/2010 00:00    0
03/03/2010 00:05    0
03/03/2010 00:10    0
03/03/2010 00:15    0
03/03/2010 00:20    0

df

                 Perc
Date_Time   
03/03/2010 00:05    1
03/03/2010 00:15    2

df2.update(df)

                 Perc
Date_Time   
03/03/2010 00:00    0
03/03/2010 00:05    1
03/03/2010 00:10    0
03/03/2010 00:15    2
03/03/2010 00:20    0
hellpanderr
  • 5,581
  • 3
  • 33
  • 43
  • Many thanks for the help...have to admit, found you were right, and that in my actual data set that is over 500k lines, there is a duplicate Date_time, that threw out the update process...but, learnt more, and maybe this will help someone else...many thanks.. – Clint Nov 10 '15 at 10:22