Pandas merge how to avoid unnamed column

Question

There are two DataFrames that I want to merge:

DataFrame A columns: index, userid, locale  (2000 rows)  
DataFrame B columns: index, userid, age     (300 rows)

When I perform the following:

pd.merge(A, B, on='userid', how='outer')

I got a DataFrame with the following columns:

index, Unnamed:0, userid, locale, age

The index column and the Unnamed:0 column are identical. I guess the Unnamed:0 column is the index column of DataFrame B.

My question is: is there a way to avoid this Unnamed column when merging two DFs?

I can drop the Unnamed column afterwards, but just wondering if there is a better way to do it.

Have you tried setting index = False? There is a good discussion on this here: http://stackoverflow.com/questions/36519086/pandas-how-to-get-rid-of-unnamed-column-in-a-dataframe — datawrestler, Dec 11 '16 at 15:11
@datawrestler merge does not have a index argument. (to_csv has it but not merge) — Cheng, Dec 11 '16 at 15:21
Right, but if you set the flag when reading in each DF and then merge that might do it — datawrestler, Dec 11 '16 at 15:40
@datawrestler the index was automatically set by the DF not by reading from a file. — Cheng, Dec 11 '16 at 15:45
can you share a sample of the data sets and the code used to arrive at the merge? I am sure we can find a solution to this! — datawrestler, Dec 11 '16 at 15:46
@datawrestler sorry I cannot share the data but you can try to replicate the column structure. (I don't think the number of rows matters) — Cheng, Dec 11 '16 at 15:51
@Cheng, can you post an output of the following command: `print(A.columns.tolist()); print(B.columns.tolist())` __before__ merging? — MaxU - stand with Ukraine, Dec 11 '16 at 16:16
@MaxU A's list ['Unnamed: 0', 'userid', 'locale'], B's list ['userid', 'age']. — Cheng, Dec 11 '16 at 16:41
@MaxU to_csv does not have `index` as a parameter. I did read his reply. I solved it by `read_csv('file.csv', index=0)` — Cheng, Dec 11 '16 at 16:47
@Cheng, i could check only back to Pandas version 0.7.0 - it's already had `index` parameter - http://pandas.pydata.org/pandas-docs/version/0.7.0/generated/pandas.DataFrame.to_csv.html — MaxU - stand with Ukraine, Dec 11 '16 at 16:51
@datawrestler thank you, I was focusing too much on reading from csv and never thought it was the write operation giving me trouble — Cheng, Dec 11 '16 at 17:07

score 13 · Accepted Answer · answered Dec 11 '16 at 17:01

In summary, what you're doing is saving the index to file and when you're reading back from the file, the column previously saved as index is loaded as a regular column.

There are a few ways to deal with this:

Method 1

When saving a pandas.DataFrame to disk, use index=False like this:

df.to_csv(path, index=False)

Method 2

When reading from file, you can define the column that is to be used as index, like this:

df = pd.read_csv(path, index_col='index')

Method 3

If method #2 does not suit you for some reason, you can always set the column to be used as index later on, like this:

df.set_index('index', inplace=True)

After this point, your datafame should look like this:

        userid    locale    age
index
    0    A1092     EN-US     31
    1    B9032     SV-SE     23

I hope this helps.

score 5 · Answer 2 · answered Dec 11 '16 at 16:47

Either don't write index when saving DataFrame to CSV file (df.to_csv('...', index=False)) or if you have to deal with CSV files, which you can't change/edit, use usecols parameter:

A = pd.read_csv('/path/to/fileA.csv', usecols=['userid','locale'])

in order to get rid of the Unnamed:0 column ...

Pandas merge how to avoid unnamed column

2 Answers2