11

There are two DataFrames that I want to merge:

DataFrame A columns: index, userid, locale  (2000 rows)  
DataFrame B columns: index, userid, age     (300 rows)

When I perform the following:

pd.merge(A, B, on='userid', how='outer')

I got a DataFrame with the following columns:

index, Unnamed:0, userid, locale, age

The index column and the Unnamed:0 column are identical. I guess the Unnamed:0 column is the index column of DataFrame B.

My question is: is there a way to avoid this Unnamed column when merging two DFs?

I can drop the Unnamed column afterwards, but just wondering if there is a better way to do it.

Guillaume Jacquenot
  • 11,217
  • 6
  • 43
  • 49
Cheng
  • 16,824
  • 23
  • 74
  • 104

2 Answers2

13

In summary, what you're doing is saving the index to file and when you're reading back from the file, the column previously saved as index is loaded as a regular column.

There are a few ways to deal with this:

Method 1

When saving a pandas.DataFrame to disk, use index=False like this:

df.to_csv(path, index=False)

Method 2

When reading from file, you can define the column that is to be used as index, like this:

df = pd.read_csv(path, index_col='index')

Method 3

If method #2 does not suit you for some reason, you can always set the column to be used as index later on, like this:

df.set_index('index', inplace=True)

After this point, your datafame should look like this:

        userid    locale    age
index
    0    A1092     EN-US     31
    1    B9032     SV-SE     23

I hope this helps.

Thanos
  • 2,472
  • 1
  • 16
  • 33
5

Either don't write index when saving DataFrame to CSV file (df.to_csv('...', index=False)) or if you have to deal with CSV files, which you can't change/edit, use usecols parameter:

A = pd.read_csv('/path/to/fileA.csv', usecols=['userid','locale'])

in order to get rid of the Unnamed:0 column ...

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419