4

I am going through the Pandas "cook book" chapter 1, bikes.csv example. When I try to change parse_dates to ['Date'], dayfirst=True, index_col to date like this: (at line: In [6], in the cook book's 1st chapter)

fixed_df = pd.read_csv('../data/bikes.csv', sep=';', encoding='latin1', parse_dates=['Date'], dayfirst=True, index_col='Date')

I get this: ValueError: 'Date' is not in list. Before I write here, I try these solutions:

1st: utf-8 bom problem

As I understand, bom with in the utf-8 creates some problem and cause this error. In addition "Dates" line is accepted as a tuple by pandas while reading? (so sorry if I write it with wrong words, but this what I remember and I am not pro at Python) I try to convert encoding with this suggestion: the "utf-8-sig" codec gives a unicode string without the BOM:

fp = open("file.txt")
s = fp.read()
u = s.decode("utf-8-sig")

Even I did not get any error, it did not work.

2nd: Vim I try these to change encoding

iconv -f UTF-8 -t ISO-8859-1 infile.txt > outfile.txt

and this,

vim +"set nobomb | set fenc=utf8 | x" filename.txt

None of them works.

3rd: I try to change the file encoding when I open it with vim.

set fileencoding=utf-8-sig (and other possible codings like ANSI, ASCII etc.) I get this error

E213: Cannot convert (add ! to write without conversion)

Would you please help me, where do I miss? Many thanks in advance

Z.Grey
  • 164
  • 2
  • 12
  • `parse_dates=True` doesn't work? – cs95 Jul 23 '17 at 21:09
  • 1
    Try `pd.read_csv('../data/bikes.csv', sep=';', encoding='utf-8-sig', parse_dates=['Date'], dayfirst=True, index_col='Date')` or `pd.read_csv('../data/bikes.csv', sep=';', encoding='utf-16', parse_dates=['Date'], dayfirst=True, index_col='Date')` – MaxU - stand with Ukraine Jul 23 '17 at 21:10
  • 1
    `pd.read_csv('https://raw.githubusercontent.com/jvns/pandas-cookbook/master/data/bikes.csv', sep=';', encoding='latin1', parse_dates=['Date'], dayfirst=True, index_col='Date')` works fine with Python 3.5 pandas 0.20.3. – ayhan Jul 23 '17 at 21:16
  • @COLDSPEED yes, parse_dates=True doesn't work. – Z.Grey Jul 23 '17 at 21:17
  • @MaxU I try first, utf-8-sig it throws 'ValueError: 'Date' is not in list'. And try your second suggestion, utf-16 it gives the error: 'UnicodeError: UTF-16 stream does not start with BOM' – Z.Grey Jul 23 '17 at 21:17
  • @ayhan, i just checked it under Pandas 0.20.1 - it works fine as well... – MaxU - stand with Ukraine Jul 23 '17 at 21:18
  • @Z.Grey, about using `parse_dates=True` - i'd also add `index_col=0`... – MaxU - stand with Ukraine Jul 23 '17 at 21:19
  • @MaxU I use also pandas-0.20.1 but it does not work, I get this "Date is not in the list" error – Z.Grey Jul 23 '17 at 21:21
  • @Z.Grey, are you using the same CSV file as from ayhan's comment? – MaxU - stand with Ukraine Jul 23 '17 at 21:22
  • @MaxU No my file is comptagevelo2012.csv here is the link of file [link ] (http://donnees.ville.montreal.qc.ca/dataset/f170fecc-18db-44bc-b4fe-5b0b6d2c7297/resource/d54cec49-349e-47af-b152-7740056d7311/download/comptagevelo2012.csv) – Z.Grey Jul 23 '17 at 21:30
  • @MaxU By the way I try also 'index_col=0' with in this code '(...sep=";", encoding="latin1", parse_dates=True, dayfirst=True, index_col=0 usecols=["Date-Time", "Value"])' it give the "ValueError: Usecols do not match names." I use it also without 'usecols', and well at least it did not give any error, but I can not modified the date – Z.Grey Jul 23 '17 at 21:34
  • @ayhan I got some http error, not found. I try this example from Spyder, not Jupiter maybe that's why I get this error, I don't know. Thus, I download the file, before trying this example. – Z.Grey Jul 23 '17 at 21:55
  • @Z.Grey the http error is because of the formatting in the comments so I posted an answer instead. Try the code in the answer. – ayhan Jul 23 '17 at 21:56
  • @ayhan, ooh that's right, and I try your answer and it works. Thank you also for your explanation part in the answer, it clarifies the things in my head, many thanks – Z.Grey Jul 23 '17 at 22:00
  • @Z.Grey You are welcome. Glad that it worked. :) – ayhan Jul 23 '17 at 22:01

1 Answers1

6

With the URL you provided

url = 'http://donnees.ville.montreal.qc.ca/dataset/f170fecc-18db-44bc-b4fe-5b0b6d2c7297/resource/d54cec49-349e-47af-b152-7740056d7311/download/comptagevelo2012.csv'

df = pd.read_csv(url, sep=',', parse_dates={'datetime':[0, 1]}, index_col='datetime')

df.head()

gives

            Rachel / Papineau  Berri1  Maisonneuve_2  Maisonneuve_1  Brébeuf  \
datetime                                                                       
2012-01-01                 16      35             51             38      5.0   
2012-02-01                 43      83            153             68     11.0   
2012-03-01                 58     135            248            104      2.0   
2012-04-01                 61     144            318            116      2.0   
2012-05-01                 95     197            330            124      6.0   

I have changed both the sep and encoding arguments because the separator in that file is comma and the encoding is utf-8 (the default value for read_csv). There is an unnamed column for time, you can use that to include in parsing too. In this example I think they are all zero but this might be useful in other cases.

ayhan
  • 70,170
  • 20
  • 182
  • 203