I am merging two dataframes using merge(..., how='left')
since I want to retain only entries that match up with the "left" dataframe. The problem is that the merge operation seems to drop the index of my leftmost dataframe, shown here:
import pandas
df1 = pandas.DataFrame([{"id": 1,
"name": "bob"},
{"id": 10,
"name": "sally"}])
df1 = df1.set_index("id")
df2 = pandas.DataFrame([{"name": "bob",
"age": 10},
{"name": "sally",
"age": 11}])
print "df1 premerge: "
print df1
df1 = df1.merge(df2, on=["name"],
how="left")
print "merged: "
print df1
# This is not "id"
print df1.index
# And there's no "id" field
assert ("id" in df1.columns) == False
Before the merge, df1
was indexed by id
. After the merge operation, there's just the default numeric index for the merged dataframe and the id
field was dropped. How can I do this kind of merge operation but retain the index of the leftmost dataframe?
To clarify: I want all the columns of df2
to be added to every entry in df1
that has the matching id
value. If an entry in df2
has an id
value not in df1
, then that shouldn't be merged in (hence the how='left'
).
edit: I could as a hack do: df1.reset_index()
but merging and then set the index again, but I prefer not to if possible, it seems like merge shouldn't have to drop the index. thanks.