8

My csv is as follows (MQM Q.csv):

Date-Time,Value,Grade,Approval,Interpolation Code 
31/08/2012 12:15:00,,41,1,1 
31/08/2012 12:30:00,,41,1,1 
31/08/2012 12:45:00,,41,1,1 
31/08/2012 13:00:00,,41,1,1 
31/08/2012 13:15:00,,41,1,1 
31/08/2012 13:30:00,,41,1,1 
31/08/2012 13:45:00,,41,1,1 
31/08/2012 14:00:00,,41,1,1 
31/08/2012 14:15:00,,41,1,1

The first few lines have no "Value" entries but they start later on.

Here is my code:

import pandas as pd 
from StringIO import StringIO
Q = pd.read_csv(StringIO("""/cygdrive/c/temp/MQM Q.csv"""), header=0, usecols=["Date-Time", "Value"], parse_dates=True, dayfirst=True, index_col=0)

I get the following error:

Traceback (most recent call last):
  File "daily.py", line 4, in <module>
    Q = pd.read_csv(StringIO("""/cygdrive/c/temp/MQM Q.csv"""), header=0, usecols=["Date-Time", "Value"], parse_dates=True, dayfirst=True, index_col=0)
  File "/usr/lib/python2.7/site-packages/pandas-0.14.0-py2.7-cygwin-1.7.30-x86_64.egg/pandas/io/parsers.py", line 443, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/lib/python2.7/site-packages/pandas-0.14.0-py2.7-cygwin-1.7.30-x86_64.egg/pandas/io/parsers.py", line 228, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/lib/python2.7/site-packages/pandas-0.14.0-py2.7-cygwin-1.7.30-x86_64.egg/pandas/io/parsers.py", line 533, in __init__
    self._make_engine(self.engine)
  File "/usr/lib/python2.7/site-packages/pandas-0.14.0-py2.7-cygwin-1.7.30-x86_64.egg/pandas/io/parsers.py", line 670, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/lib/python2.7/site-packages/pandas-0.14.0-py2.7-cygwin-1.7.30-x86_64.egg/pandas/io/parsers.py", line 1067, in __init__
    col_indices.append(self.names.index(u))
ValueError: 'Value' is not in list
Sid Kwakkel
  • 749
  • 3
  • 11
  • 31
  • Can you format your data or provide a link to it as I cannot reproduce your error and it's unclear where the formatting is failing – EdChum Jun 18 '14 at 19:51
  • Erm, why are you calling StringIO on the filename? – DSM Jun 18 '14 at 19:52
  • 1
    The following worked for me: `pd.read_csv(io.StringIO(temp),header=0, usecols=["Date-Time", "Value"], parse_dates=True, dayfirst=True, index_col=0)` so is the problem you are using StringIO when it is unnecessary? – EdChum Jun 18 '14 at 20:01
  • Could you clarify what io and temp are set to? – Sid Kwakkel Jun 18 '14 at 20:09
  • @EdChum I tried removing the StringIO but then I get an error saying "ValueError: 'Date-Time' is not in list" – Sid Kwakkel Jun 18 '14 at 20:22
  • Can you post a link to the data as your pasted data worked, also try this `pd.read_csv('/cygdrive/c/temp/MQM Q.csv',header=0, usecols=["Date-Time", "Value"], parse_dates=True, dayfirst=True, index_col=0)` – EdChum Jun 18 '14 at 20:24
  • [link to file](https://drive.google.com/file/d/0B9q0tbxskwK2aEYzQlkwLUp0QVk/edit?usp=sharing) – Sid Kwakkel Jun 18 '14 at 20:45
  • Trying your code resulted in an error saying "ValueError: 'Date-Time' is not in list" – Sid Kwakkel Jun 18 '14 at 20:50
  • @user1571934 no, ed's code is correct (works on 0.14.0). What version of pandas are you using? The other option is not to use usecols (and just select the columns you want after reading). – Andy Hayden Jun 18 '14 at 21:03
  • There is something strange here, I can reproduce your error using your params but if I just plain load it, it works ok: `df=pd.read_csv('MQM Q.csv')`, if I pass 'Date-Time' to use for `usecols` param then I get the same error – EdChum Jun 18 '14 at 21:14
  • I'm using 0.14.0 on cygwin. @EdChum Code works if i remove all options as well. – Sid Kwakkel Jun 18 '14 at 21:19
  • I think this is a bug but I can't explain it, your csv is encoded using `utf-8`, I tried to see what happens if we change the encoding to `ANSI` and it loaded without error, then I tried `utf-8 without BOM` and it worked, then I tried `utf-8` again and it failed (Iused notepad++ to do the conversions) – EdChum Jun 18 '14 at 21:19

3 Answers3

5

This appears to be a bug with the csv parser, firstly this works:

df = pd.read_csv('MQM Q.csv')

also this works:

df = pd.read_csv('MQM Q.csv', usecols=['Value'])

but if I want Date-Time then it fails with the same error message as yours.

So I noticed it was utf-8 encoded and so I converted using notepad++ to ANSI and it worked, I then tried utf-8 without BOM and it also worked.

I then converted it to utf-8 (presumably there is now a BOM) and it failed with the same error as before, so I don't think you are imaging this now and this looks like a bug.

I am using python 3.3, pandas 0.14 and numpy 1.8.1

To get around this do this:

df = pd.read_csv('MQM Q.csv', usecols=[0,1], parse_dates=True, dayfirst=True, index_col=0)

This will set your index to the Date-Time column which will correctly convert to a datetimeindex.

In [40]:

df.index
Out[40]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-08-31 12:15:00, ..., 2013-11-28 10:45:00]
Length: 43577, Freq: None, Timezone: None
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • Man, a big +1 for mentioning the BOM issue. I don't know how many times I've struggled with this 'error'. Just saved re-saved a file with UTF no BOM and it worked immediately. – djnz0feh Aug 05 '16 at 09:21
  • 2
    @djnz0feh i think in actuality it should just work if you pass `encoding='utf-8'` now: `df = pd.read_csv('MQM Q.csv', usecols=[0,1], parse_dates=True, dayfirst=True, index_col=0, encoding='utf-8')` – EdChum Aug 05 '16 at 09:24
  • Should put that in the answer – Tunn Jan 12 '18 at 04:09
0

Your code should read (no need from StringIO on the filename!):

import pandas as pd 
Q = pd.read_csv("/cygdrive/c/temp/MQM Q.csv"), header=0, usecols=["Date-Time", "Value"], parse_dates=True, dayfirst=True, index_col=0)

Otherwise/currently pandas is trying to read the string (of the path) in as a DataFrame:

In [11]: pd.read_csv(StringIO("""/cygdrive/c/temp/MQM Q.csv"""))
Out[11]:
Empty DataFrame
Columns: [/cygdrive/c/temp/MQM Q.csv]
Index: []

which obviously isn't what you want (hence you see the Value is not a column exception).

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
0

The following works for me (I have the CSV file in the same directory as the script, but that should not matter). I am running the following script on my Mac, not Cygwin, but it should work the same way:

import pandas as pd 
Q = pd.read_csv("MQM Q.csv",
        header=0,
        parse_dates=True, 
        dayfirst=True,
        index_col=0,
        usecols=["Date-Time", "Value"])
print Q

Discussion

  • StringIO will not work, unless you create a new StringIO object with the contents of the file, not the name of the file.
  • I don't have any problem with the "Date-Time" column. In fact, there is not error running the previous code at all.
Hai Vu
  • 37,849
  • 11
  • 66
  • 93