This is usually caused by your CSV having been saved along with an (unnamed) index (RangeIndex
).
(The fix would actually need to be done when saving the DataFrame, but this isn't always an option.)
Workaround: read_csv
with index_col=[0]
argument
IMO, the simplest solution would be to read the unnamed column as the index. Specify an index_col=[0]
argument to pd.read_csv
, this reads in the first column as the index. (Note the square brackets).
df = pd.DataFrame('x', index=range(5), columns=list('abc'))
df
a b c
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
# Save DataFrame to CSV.
df.to_csv('file.csv')
pd.read_csv('file.csv')
Unnamed: 0 a b c
0 0 x x x
1 1 x x x
2 2 x x x
3 3 x x x
4 4 x x x
# Now try this again, with the extra argument.
pd.read_csv('file.csv', index_col=[0])
a b c
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
Note
You could have avoided this in the first place by
using index=False
if the output CSV was created in pandas, if your DataFrame does not have an index to begin with:
df.to_csv('file.csv', index=False)
But as mentioned above, this isn't always an option.
Stopgap Solution: Filtering with str.match
If you cannot modify the code to read/write the CSV file, you can just remove the column by filtering with str.match
:
df
Unnamed: 0 a b c
0 0 x x x
1 1 x x x
2 2 x x x
3 3 x x x
4 4 x x x
df.columns
# Index(['Unnamed: 0', 'a', 'b', 'c'], dtype='object')
df.columns.str.match('Unnamed')
# array([ True, False, False, False])
df.loc[:, ~df.columns.str.match('Unnamed')]
a b c
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x