130

I have a data file from columns A-G like below but when I am reading it with pd.read_csv('data.csv') it prints an extra unnamed column at the end for no reason.

colA    ColB    colC    colD    colE    colF    colG    Unnamed: 7
44      45      26      26      40      26      46        NaN
47      16      38      47      48      22      37        NaN
19      28      36      18      40      18      46        NaN
50      14      12      33      12      44      23        NaN
39      47      16      42      33      48      38        NaN

I have seen my data file various times but I have no extra data in any other column. How I should remove this extra column while reading ? Thanks

muazfaiz
  • 4,611
  • 14
  • 50
  • 88
  • 1
    Your first column is probably the index col see related: http://stackoverflow.com/questions/36519086/pandas-how-to-get-rid-of-unnamed-column-in-a-dataframe – EdChum May 15 '17 at 15:43
  • 2
    I just had the same issue. I examined my data file.. and found that there was an extra separator at the end of the header row (row 0). – UnadulteratedImagination Sep 02 '21 at 17:56

4 Answers4

322
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]

In [162]: df
Out[162]:
   colA  ColB  colC  colD  colE  colF  colG
0    44    45    26    26    40    26    46
1    47    16    38    47    48    22    37
2    19    28    36    18    40    18    46
3    50    14    12    33    12    44    23
4    39    47    16    42    33    48    38

NOTE: very often there is only one unnamed column Unnamed: 0, which is the first column in the CSV file. This is the result of the following steps:

  1. a DataFrame is saved into a CSV file using parameter index=True, which is the default behaviour
  2. we read this CSV file into a DataFrame using pd.read_csv() without explicitly specifying index_col=0 (default: index_col=None)

The easiest way to get rid of this column is to specify the parameter pd.read_csv(..., index_col=0):

df = pd.read_csv('data.csv', index_col=0)
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • For some reason, the above response was unsuccessful in my case, but this [link](https://www.datasciencelearner.com/drop-unnamed-column-pandas/) resolved it for me. Using **match** instead of **contains** `df2.loc[:,~df2.columns.str.match("Unnamed")]` – vcnr_1234 Jun 08 '22 at 10:54
  • 1
    The `index_col=0` comment is very useful - most occurrences of Unnamed columns are for "Unnamed: 0", which is likely to be an index. Although not the case in this question. – Dave Jul 26 '22 at 11:44
37

First, find the columns that have 'unnamed', then drop those columns. Note: You should Add inplace = True to the .drop parameters as well.

df.drop(df.columns[df.columns.str.contains('unnamed',case = False)],axis = 1, inplace = True)
Laughing Vergil
  • 3,706
  • 1
  • 14
  • 28
Adil Warsi
  • 503
  • 4
  • 6
15

The pandas.DataFrame.dropna function removes missing values (e.g. NaN, NaT).

For example the following code would remove any columns from your dataframe, where all of the elements of that column are missing.

df.dropna(how='all', axis='columns')
Gaurang Tandon
  • 6,504
  • 11
  • 47
  • 84
Susan
  • 203
  • 2
  • 6
  • 6
    From Review: Welcome to Stack Overflow! Try to provide a nice description about how your solution works. See: [How do I write a good answer?](https://stackoverflow.com/help/how-to-answer). Thanks – sɐunıɔןɐqɐp Oct 08 '18 at 07:04
8

The approved solution doesn't work in my case, so my solution is the following one:

    ''' The column name in the example case is "Unnamed: 7"
 but it works with any other name ("Unnamed: 0" for example). '''

        df.rename({"Unnamed: 7":"a"}, axis="columns", inplace=True)

        # Then, drop the column as usual.

        df.drop(["a"], axis=1, inplace=True)

Hope it helps others.

Ezarate11
  • 439
  • 6
  • 11