300

I have a situation wherein sometimes when I read a csv from df I get an unwanted index-like column named unnamed:0.

file.csv

,A,B,C
0,1,2,3
1,4,5,6
2,7,8,9

The CSV is read with this:

pd.read_csv('file.csv')

   Unnamed: 0  A  B  C
0           0  1  2  3
1           1  4  5  6
2           2  7  8  9

This is very annoying! Does anyone have an idea on how to get rid of this?

smci
  • 32,567
  • 20
  • 113
  • 146
Collective Action
  • 7,607
  • 15
  • 45
  • 60
  • 4
    Don't assume the CSV file was necessarily written out either in pandas or by the OP. (Often, the CSV came from some other user/tool/script, so they can't control its format.) Hence the `read_csv(..., index_col=[0])` workaround is the main thing, rather than "Do `pd.to_csv(..., index=False)`". – smci Dec 15 '20 at 04:10

12 Answers12

377

It's the index column, pass pd.to_csv(..., index=False) to not write out an unnamed index column in the first place, see the to_csv() docs.

Example:

In [37]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
pd.read_csv(io.StringIO(df.to_csv()))

Out[37]:
   Unnamed: 0         a         b         c
0           0  0.109066 -1.112704 -0.545209
1           1  0.447114  1.525341  0.317252
2           2  0.507495  0.137863  0.886283
3           3  1.452867  1.888363  1.168101
4           4  0.901371 -0.704805  0.088335

compare with:

In [38]:
pd.read_csv(io.StringIO(df.to_csv(index=False)))

Out[38]:
          a         b         c
0  0.109066 -1.112704 -0.545209
1  0.447114  1.525341  0.317252
2  0.507495  0.137863  0.886283
3  1.452867  1.888363  1.168101
4  0.901371 -0.704805  0.088335

You could also optionally tell read_csv that the first column is the index column by passing index_col=0:

In [40]:
pd.read_csv(io.StringIO(df.to_csv()), index_col=0)

Out[40]:
          a         b         c
0  0.109066 -1.112704 -0.545209
1  0.447114  1.525341  0.317252
2  0.507495  0.137863  0.886283
3  1.452867  1.888363  1.168101
4  0.901371 -0.704805  0.088335
smci
  • 32,567
  • 20
  • 113
  • 146
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • 4
    A lot of times the datasets you get from elsewhere already contain this column so it doesn't really help knowing how to produce the "right" dataset using the right parameters. Is there a way to eliminate this column when you load it when it's already there? – Calvin Ku Mar 24 '18 at 09:07
  • 5
    @CalvinKu unfortunately there is no `skipcols` arg for `read_csv`, after reading in the csv you could just do `df = df.drop(columns=df.columns[0])` or you could just read the columns in first and then pass the cols minus the first column something like `cols = pd.read_csv( ....., nrows=1).columns` and then re-read again `df = pd.read_csv(....., usecols=cols[1:])` this avoids the overhead of reading a superfluous column and then dropping it afterwards – EdChum Mar 24 '18 at 16:39
  • 3
    It's better practice to include the square brackets in `read_csv(..., index_col=[0])`. Also avoids [tickling the whole pre-0.16.1 deprecated `index_col = False` bug](https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.16.1.html). – smci Dec 15 '20 at 04:08
  • @smci I saw similar solution posted by @cs95. Just wondering whey `index_col=[0]` is better than `index_col=0`? – Mr.Robot Jan 27 '21 at 15:09
  • 3
    @MrRobot: I posted the link above which tells you why: [`index_col = False` was deprecated way back in 0.16.1](https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.16.1.html), and Python can't really tell the diference between `bool False` and `int 0`, so `index_col = 0` is deprecated too. Put list brackets around the zero: `[0]`. – smci Jan 28 '21 at 00:27
115

This is usually caused by your CSV having been saved along with an (unnamed) index (RangeIndex).

(The fix would actually need to be done when saving the DataFrame, but this isn't always an option.)

Workaround: read_csv with index_col=[0] argument

IMO, the simplest solution would be to read the unnamed column as the index. Specify an index_col=[0] argument to pd.read_csv, this reads in the first column as the index. (Note the square brackets).

df = pd.DataFrame('x', index=range(5), columns=list('abc'))
df

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

# Save DataFrame to CSV.
df.to_csv('file.csv')
pd.read_csv('file.csv')

   Unnamed: 0  a  b  c
0           0  x  x  x
1           1  x  x  x
2           2  x  x  x
3           3  x  x  x
4           4  x  x  x

# Now try this again, with the extra argument.
pd.read_csv('file.csv', index_col=[0])

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

Note
You could have avoided this in the first place by using index=False if the output CSV was created in pandas, if your DataFrame does not have an index to begin with:

df.to_csv('file.csv', index=False)

But as mentioned above, this isn't always an option.


Stopgap Solution: Filtering with str.match

If you cannot modify the code to read/write the CSV file, you can just remove the column by filtering with str.match:

df 

   Unnamed: 0  a  b  c
0           0  x  x  x
1           1  x  x  x
2           2  x  x  x
3           3  x  x  x
4           4  x  x  x

df.columns
# Index(['Unnamed: 0', 'a', 'b', 'c'], dtype='object')

df.columns.str.match('Unnamed')
# array([ True, False, False, False])

df.loc[:, ~df.columns.str.match('Unnamed')]
 
   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x
cs95
  • 379,657
  • 97
  • 704
  • 746
  • 4
    Thanks a lot! That `index_col=[0]` fix easily solved this annoying problem of 'unnamed:0' and spares code from verbose reinventing the wheel. – user48115 Jul 15 '19 at 05:58
  • 4
    To get ride of Unnamed columns, you can also use regex such as `df.drop(df.filter(regex="Unname"),axis=1, inplace=True)` – Sarah Sep 29 '19 at 19:34
  • 1
    The stopgap solution was relevant to my issue. Great answer! – Mohseen Mulla Jun 20 '22 at 09:20
  • I think the stopgap solution is the right one. If I have no assurance that the Unnamed column exists, I need to deal with both cases, and removing the first column every time is not an option. – mic Dec 16 '22 at 20:10
35

To get ride of all Unnamed columns, you can also use regex such as df.drop(df.filter(regex="Unname"),axis=1, inplace=True)

Sarah
  • 1,854
  • 17
  • 18
  • 2
    This worked really well for me as `str.match` above created `nans` for my column names that were date formatted. – RK1 Nov 28 '20 at 12:19
  • I had the same problem with my data column names that had integers, this code solved the problem. – ah bon Nov 01 '22 at 23:53
11

Another case that this might be happening is if your data was improperly written to your csv to have each row end with a comma. This will leave you with an unnamed column Unnamed: x at the end of your data when you try to read it into a df.

Brendan
  • 1,905
  • 2
  • 19
  • 25
  • 3
    I used `usecols=range(0,10)` to cut off the unnamed column – Nash Dec 04 '18 at 12:21
  • An example of this is having data such as `musical_instruments_store.csv` `color,item\nbrown,piano,` Either: - you have a missing column on the first line (e.g price which is null in this case so it should've been `colour, item, price`) - Trailing comma on the second line so the data needs to be cleaned (so it should be `brown,piano` instead. – Mark Apr 05 '21 at 20:13
11

You can do either of the following with 'Unnamed' Columns:

  1. Delete unnamed columns
  2. Rename them (if you want to use them)

Method 1: Delete Unnamed Columns

# delete one by one like column is 'Unnamed: 0' so use it's name
df.drop('Unnamed: 0', axis=1, inplace=True)

#delete all Unnamed Columns in a single code of line using regex
df.drop(df.filter(regex="Unnamed"),axis=1, inplace=True)

Method 2: Rename Unnamed Columns

df.rename(columns = {'Unnamed: 0':'Name'}, inplace = True)

If you want to write out with a blank header as in the input file, just choose 'Name' above to be ''.

where the OP's input data 'file.csv' was:

,A,B,C
0,1,2,3
1,4,5,6
2,7,8,9

#read file df = pd.read_csv('file.csv')

smci
  • 32,567
  • 20
  • 113
  • 146
Jatin Kaushik
  • 161
  • 2
  • 6
4

Simply delete that column using: del df['column_name']

ssareen
  • 246
  • 2
  • 6
  • This is probably the easiest way – Anshuman Kumar Oct 20 '20 at 17:08
  • @AnshumanKumar: no it's not, the easiest way is simply to use `read_csv(..., index_col=[0])`. Generally with pandas, trying to hack around an issue instead of taking 5 min to read the documentation then experimenting till you solve it only causes/defers further problems. Admittedly the pandas doc is unclear, incomplete, and lagging the code (welcome to open source!). That's what github is for. – smci Dec 15 '20 at 05:25
4

Simple do this:

df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
HITESH GUPTA
  • 149
  • 4
4

Alternatively:

df = df.drop(columns=['Unnamed: 0'])
3
from IPython.display import display
import pandas as pd
import io


df = pd.read_csv('file.csv',index_col=[0])
df = pd.read_csv(io.StringIO(df.to_csv(index=False)))
display(df.head(5))
3

In my experience, there are many reasons you might not want to set that column as index_col =[0] as so many people suggest above. For example it might contain jumbled index values because data were saved to csv after being indexed or sorted without df.reset_index(drop=True) leading to instant confusion.

So if you know the file has this column and you don't want it, as per the original question, the simplest 1-line solutions are:

df = pd.read_csv('file.csv').drop(columns=['Unnamed: 0'])

or

df = pd.read_csv('file.csv',index_col=[0]).reset_index(drop=True)

0

A solution that is agnostic to whether the index has been written or not when utilizing df.to_csv() is shown below:

df = pd.read_csv(file_name)
if 'Unnamed: 0' in df.columns:
    df.drop('Unnamed: 0', axis=1, inplace=True)

If an index was not written, then index_col=[0] will utilize the first column as the index which is behavior that one would not want.

gsandhu
  • 489
  • 5
  • 13
0

My issue was slightly different. The first header was unnamed, which caused all my data afterwards to shift left, making it misaligned, and made the last column all NaN.

I resolved it using the following:

df = pd.read_csv('filename.csv', nrows=1)
columns = list(df.columns)[1:]
df = pd.read_csv('filename.csv', skiprows=1, header=None, names=columns)
DarkHark
  • 614
  • 1
  • 6
  • 20