0

I am trying to parse 6 datetime columns to one withon a .dat-file. I have tried this or this but I remain to get errors. My file looks like this:

YYYY MM DD HH MM SS Julday a b c d
Lat                        35...
Lon                        -120...
Elev                       680
2015 10 09 00 00 30 235.34 3 4 2 6
...

I have about 400 rows. I want the 6 datetime columns to be in the usual way like 2013-10-09 00:00:30.

I hve tried this for example:

dat-file = pd.read_csv('C:/.../myfile.dat', header=4, sep='\s+\s', parse_dates=[[0]])

But I get the error message:

ValueError: New date column already in dict 2013 10 20 00 47 30

I am new to python and have not used pandas yet. Probably there is a problem reading a .dat-file with pd.read_csv? Thanks!

Community
  • 1
  • 1
beginner123
  • 73
  • 2
  • 8
  • Hi. You have two errors here: 1. when the `header` argument in `read_csv` is an integer, say `n`, it takes the `n`th row as the header. So in your case, it would take the fourth row as the header, which is not the case. 2. The correct value for the `parse_dates` argument is `[range(6)]`, which makes pandas combine the first six columns into a single datetime column. – Jaidev Deshpande Nov 24 '15 at 13:09
  • Is non aligning row repeats after first three? – WoodChopper Nov 24 '15 at 14:55
  • thank you very much @Jaidev Deshpande. I now tried `import pandas as pd df = pd.read_csv('C:/.../myfile.dat', header=1, sep='\s+\s', parse_dates=([range(6)])` but I don't get any response, nothing happens. The very first row is my header row, I do not need the second, third and fourth row, so I need to skip those. – beginner123 Nov 24 '15 at 15:16
  • @WoodChopper: sorry, what do you mean? I edited my question a little bit. I do not need the second, third and fourth row. – beginner123 Nov 24 '15 at 15:18
  • Header is `YYYY MM DD HH MM SS Julday a b c d` aligning row is `2015 10 09 00 00 30 235.34 3 4 2 6` right? – WoodChopper Nov 24 '15 at 15:28
  • @WoodChopper: yes, header is `YYYY...` and the first row needed is `2015 ...` – beginner123 Nov 24 '15 at 15:37
  • if `header=4` your column index in dataframe will became `2015 10 09 00 00 30 235.34 3 4 2 6`. – WoodChopper Nov 24 '15 at 15:41
  • Look into this http://stackoverflow.com/questions/20193835/parse-dates-when-year-month-day-and-hour-are-in-separate-columns-using-pandas-in?lq=1 – WoodChopper Nov 24 '15 at 15:43
  • @WoodChopper: yes, this is exactly what the column index is when using that code. I have tried it with `header=1`, because the first row actually is the needed header, but I do not get anything. No error message and nothing happens. I do not know neither what the separator `\s+\s` means, maybe there is something wring as well? – beginner123 Nov 24 '15 at 16:41
  • okay I have simply deleted the 3 rows not needed. when I am doing `import pandas as pd df = pd.read_csv('C:/.../myfile.dat', header=1, sep='\s+\s', parse_dates=([range(6)])` as suggested, now I get the error message `ValueError: [0, 1, 2, 3, 4, 5] is not in list ` – beginner123 Nov 24 '15 at 17:09
  • @WoodChopper: I have already tried the suggested answer with: `df = pd.read_csv(file, header=None, index_col='datetime', parse_dates={'datetime': [0,1,2,3,4,5]}, date_parser=lambda x: pd.datetime.strptime(x, '%Y %m %d %H %M %S')) ` which does not work. I get the error: `IOError: Expected file path name or file-like object, got type ` – beginner123 Nov 25 '15 at 17:32

0 Answers0