3

I have a csv file with the first 2 rows with data as:

NewDateTime ResourceName    
9/18/12 1:00    ANACACHO_ANA    
9/18/12 2:00    ANACACHO_ANA    

When I read it using pandas data frame as:

df = pd.read_csv(r'MyFile.csv')

I get

df1.columns
Index([u'NewDateTime', u'ResourceName', dtype='object')

However, when I try

df1['NewDateTime']

I get error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 5: ordinal not in range(128)

Also the df1['NewDateTime'] on my pycharm interpreter has a little dash as in df1['-NewDateTime'] but when I paste it here the dash doesn't show up

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
Zanam
  • 4,607
  • 13
  • 67
  • 143
  • 1
    As of the [docs](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html): read_csv does - per default - only handle comma-separated csv files. I further assume you have some encoding issues. Take this small csv file and create it again by hand. Does the problem still occur? What happens when you simply read in the file contents and print them via python? Is there something weird? – Michael Hoff Jul 19 '16 at 22:29

1 Answers1

4

It looks like your CSV file has a BOM (Byte Order Mark) signature, so try to parse using 'utf-8-sig', 'utf-16' or another encoding with BOM:

df = pd.read_csv(r'MyFile.csv', encoding='utf-8-sig')

Here is a small demo:

In [18]: pd.read_csv(fn).columns
Out[18]: Index([u'?NewDateTime', u'ResourceName'], dtype='object')

In [19]: pd.read_csv(fn, encoding='utf-8-sig').columns
Out[19]: Index([u'NewDateTime', u'ResourceName'], dtype='object')

in my iPython terminal the BOM signature is showed as ? in u'?NewDateTime' - in your case it's a dash sign: df1['-NewDateTime']

Community
  • 1
  • 1
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419