0

I have two dataframes df1 and df2. Df1 has columns A,B,C,D,E,F and df2 A,B,J,D,E,K. I want to update the second dataframe with the rows of the first but only when two first columns have the same value in both dataframes. For each row that the following two conditions are true:

  1. df1.A = df2.A
  2. df1.B = df2.B

then update accordingly:

df2.D = df1.D  
df2.E = df1.E

My dataframes have different number of rows.

When I tried this code I get a TypeError :cannot do positional indexing with these indexers of type 'str'.

for a in df1:
    for t in df2:
        if df1.iloc[a]['A'] == df2.iloc[t]['A'] and df1.iloc[a]['B'] == df2.iloc[t]['B']:
            df2.iloc[t]['D'] = df1.iloc[a]['D']
            df2.iloc[t]['E'] = df1.iloc[a]['E']
whateveros
  • 61
  • 4
  • Please include sample input data and expected output. See: https://stackoverflow.com/a/20159305/3339965 – root Jun 08 '17 at 17:44
  • If your dataframes have a different number of rows, how do you expect to compare one column to another? – A.Kot Jun 08 '17 at 18:11

1 Answers1

0

The Question:

You'd be better served merging the dataframes than doing nested iteration.

df2 = df2.merge(df1[['A', 'B', 'D', 'E']], on=['A', 'B'], how='left', suffixes=['_old', ''])
df2['D'] = df2['D'].fillna(df2['D_old'])
df2['E'] = df2['E'].fillna(df2['E_old'])
del df2['D_old']
del df2['E_old']

The first row attaches columns to df2 with values for columns D and E from corresponding rows of df1, and renames the old columns. The next two lines fill in the rows for which df1 had no matching row, and the next two delete the initial, now outdated versions of the columns.

The Error:

Your TypeError happened because for a in df1: iterates over the columns of a dataframe, which are strings here, while .iloc only takes integers. Additionally, though you didn't get to this point, to set a value you'd need both index and column contained within the brackets.

So if you did need to set values by row, you'd want something more like

for a in df1.iterrows():
    for t in df2.iterrows():
        if df1.loc[a, 'A'] == ...

Though I'd strongly caution against doing that. If you find yourself thinking about it, there's probably either a much faster, less painful way to do it in pandas, or you're better off using another tool less focused on tabular data.

EFT
  • 2,359
  • 1
  • 10
  • 11
  • Thanks for your answer. Your code is perfectly clear. The problem is that the columns C and D are still filled only with NaN values after merging the dataframes even for the corresponding rows where df1.A=df2.A and df1.B=df2.B. So after executing the next rows to fill the nulls I end up with the same dataframe that I had before merging. – whateveros Jun 09 '17 at 09:41
  • @whateveros Could you edit to include sample data, as requested by @root? This works for me. As a guess, are you sure the values from df1 & df2 are equal? If there's no match at all on the join key, you'd get the result you describe. Note that this can be due to precision or type differences. – EFT Jun 09 '17 at 13:59
  • I can't do the edit right now but I tested your solution to a different set of dataframes that I created and it's working fine. So I just checked the dtypes of my columns on df1 and df2 and it is matching perfectly. I also have many matching values for the set of A and B columns, so I'm trying to figure out why it gives me only nulls. – whateveros Jun 09 '17 at 14:25
  • What happens if you take some row that ought to match from df1 and filter df2 to those values with `df2[(df2['A'] == df1.loc[row_index, 'A']) & (df2['B'] == df1.loc[row_index, 'B'])]`? Do you get any rows back, or just an empty dataframe? – EFT Jun 09 '17 at 15:08
  • I actually solved my problem by converting the type of the columns I was doing the left join on(A and B). It was 'O' and I converted them to string. Your code is working perfect now. Thank you @EFT! – whateveros Jun 09 '17 at 15:17