reading csv with pandas and specifying columns names

Question

I'm trying to import data form a csv file using pandas:

data=pd.read_csv("data.csv")

this seems to work fine. Next I would like to specify columns' names with

data.columns = ['X', 'Y']

so that i can plot it later. And here the problem comes:

File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'X'

csv file has the following format

Anyone know what I'm doing wrong?

Thanks!

hmmm, what is `print (data.columns)` before `data.columns = ['X', 'Y']` ? — jezrael, May 07 '18 at 16:50
I think there is separator whitespace, so need `data=pd.read_csv("data.csv", delim_whitespace=True)` and omit `data.columns = ['X', 'Y']` — jezrael, May 07 '18 at 16:53
Great this works now! Thank a lot! I tried to use sep=' ', and delim=' ', but those didn't work. — spectrum, May 07 '18 at 16:59

gcharbon · Answer 1 · 2018-05-31T06:23:08.887

You're trying to import a "space" separated DataFrame. As said @jezrael in comments, you should use:

data=pd.read_csv("data.csv", delim_whitespace=True)

From the official doc:

delim_whitespace : boolean, default False

Equivalent to setting sep='\s+'. If this option is set to True, nothing should be passed in for the delimiter parameter.

More over, if you want to specify column names (which is your question):

Again from the official documentation you can see that you should either :

use names argument to specify the name that the columns should take.
use header argument to tell python that the first line (index 0) should be parsed as the colnames

So to resume, you should be able to use any of the three statements:

data = pandas.csv("data.csv", delim_whitespace=True, names=["X","Y"])

data = pandas.csv("data.csv", delim_whitespace=True, header=0)

data = pandas.csv("data.csv", sep="\s+")

Concerning header parameter:

Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file

Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file

reading csv with pandas and specifying columns names

1 Answers1