11

I am merging two dataframes using merge(..., how='left') since I want to retain only entries that match up with the "left" dataframe. The problem is that the merge operation seems to drop the index of my leftmost dataframe, shown here:

import pandas
df1 = pandas.DataFrame([{"id": 1,
                         "name": "bob"},
                        {"id": 10,
                         "name": "sally"}])
df1 = df1.set_index("id")
df2 = pandas.DataFrame([{"name": "bob",
                         "age": 10},
                        {"name": "sally",
                         "age": 11}])

print "df1 premerge: "
print df1
df1 = df1.merge(df2, on=["name"],
                how="left")
print "merged: "
print df1
# This is not "id"
print df1.index
# And there's no "id" field
assert ("id" in df1.columns) == False

Before the merge, df1 was indexed by id. After the merge operation, there's just the default numeric index for the merged dataframe and the id field was dropped. How can I do this kind of merge operation but retain the index of the leftmost dataframe?

To clarify: I want all the columns of df2 to be added to every entry in df1 that has the matching id value. If an entry in df2 has an id value not in df1, then that shouldn't be merged in (hence the how='left').

edit: I could as a hack do: df1.reset_index() but merging and then set the index again, but I prefer not to if possible, it seems like merge shouldn't have to drop the index. thanks.

  • 1
    You could try indexing `df1` and `df2` by name instead of id, and then use `join` instead of `merge` like this: `df1.join(df2)` which will preserve the index. – bdiamante Mar 27 '13 at 14:45
  • It's not possible since `name` is not unique in `df1` –  Mar 27 '13 at 14:50
  • Why don't you set_index after the merge? – user1827356 Mar 27 '13 at 15:00
  • 3
    I would but it drops the column that I want to be the index after the merge, namely `id` –  Mar 27 '13 at 15:02
  • If you didn't call set_index before merge, it should still be around in df1 – user1827356 Mar 27 '13 at 15:05
  • 2
    I found [this question](http://stackoverflow.com/questions/11976503/how-to-keep-index-when-using-pandas-merge), which appears to be the same thing. The answer there is basically the hack you talked about. – bdiamante Mar 27 '13 at 15:05
  • Preserve index after merge is a little bit strange while the method is "outer". So I think it's better not to preserve it. – waitingkuo Mar 27 '13 at 15:10
  • But I see how='left'. IMO, if how='outer' there is no way to preserve index as defined by column 'id' as it would be undefined for rows unique to df2 – user1827356 Mar 27 '13 at 15:33

1 Answers1

5

You've already pointed out doing a reset_index before the merge and a set_index afterwards, which works. The only way I know of to preserve indices across a merge is for the merge to involve an index on at least one of the data frames being merged. So here, you could do:

In [403]: df2 = df2.set_index('name')

In [404]: df1.merge(df2, left_on='name', right_index=True)
Out[404]: 
     name  age
id            
1     bob   10
10  sally   11

to merge df2's index, which we've taken from its 'name' column, against the 'name' column on df1.

This makes some sense, because otherwise the index of the resulting dataframe is ambiguous as it could come from either dataframe.

Snakes McGee
  • 970
  • 1
  • 8
  • 8