1

I am reading a csv file (SimResults_Daily.csv) into pandas, that is structured as follows:

#, Job_ID, Date/Time, value1, value2,
0, ID1,  05/01  24:00:00, 5, 6 
1, ID2,  05/02  24:00:00, 6, 15 
2, ID3,  05/03  24:00:00, 20, 21 

etc. As the datetime format cannot be read by pandas parse_dates, I have figured out I can use the command: str.replace('24:','00:').

My code currently is:

dateparse = lambda x: pd.datetime.strptime(x, '%m-%d  %H:%M:%S')

df = pd.read_csv('SimResults_Daily.csv',
    skipinitialspace=True,
    date_parser=dateparse,
    parse_dates=['Date/Time'],
    index_col=['Date/Time'],
    usecols=['Job_ID',
    'Date/Time',
    'value1',
    'value2',
    header=0)

Where in the code should I implement the str.replace command?

Andreuccio
  • 1,053
  • 2
  • 18
  • 32

1 Answers1

2

You can use:

import pandas as pd
import io

temp=u"""#,Job_ID,Date/Time,value1,value2,
0,ID1,05/01 24:00:00,5,6
1,ID2,05/02 24:00:00,6,15
2,ID3,05/03 24:00:00,20,21"""

dateparse = lambda x: pd.datetime.strptime(x.replace('24:','00:'), '%m/%d  %H:%M:%S')

#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp),
    skipinitialspace=True,
    date_parser=dateparse,
    parse_dates=['Date/Time'],
    index_col=['Date/Time'],
    usecols=['Job_ID', 'Date/Time', 'value1', 'value2'],
    header=0)

print (df)
           Job_ID  value1  value2
Date/Time                        
1900-05-01    ID1       5       6
1900-05-02    ID2       6      15
1900-05-03    ID3      20      21

Another solution with double replace - year can be added also:

dateparse = lambda x: x.replace('24:','00:').replace(' ','/1900 ')

df = pd.read_csv(io.StringIO(temp),
    skipinitialspace=True,
    date_parser=dateparse,
    parse_dates=['Date/Time'],
    index_col=['Date/Time'],
    usecols=['Job_ID', 'Date/Time', 'value1', 'value2'],
    header=0)

print (df)
           Job_ID  value1  value2
Date/Time                        
1900-05-01    ID1       5       6
1900-05-02    ID2       6      15
1900-05-03    ID3      20      21

dateparse = lambda x: x.replace('24:','00:').replace(' ','/2016 ')

df = pd.read_csv(io.StringIO(temp),
    skipinitialspace=True,
    date_parser=dateparse,
    parse_dates=['Date/Time'],
    index_col=['Date/Time'],
    usecols=['Job_ID', 'Date/Time', 'value1', 'value2'],
    header=0)

print (df)
           Job_ID  value1  value2
Date/Time                        
2016-05-01    ID1       5       6
2016-05-02    ID2       6      15
2016-05-03    ID3      20      21
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • you are always spot-on! – Andreuccio Oct 10 '16 at 13:59
  • I am confronted with the task of importing similar datasets, with hourly values rather than daily. Hence, rather than replacing `24:` with `00:` I would need to move all hours back 1 unit, that is: `24:` -> `23:`,...,`01:` -> `00:` . How would the code change to do that? – Andreuccio Jun 21 '17 at 16:14
  • I think same way, only substract one hour like `df.index = df.index - pd.Timedelta(1, unit='h')` – jezrael Jun 21 '17 at 18:43
  • I think not, unfortunately. – jezrael Jun 22 '17 at 14:36
  • the problem is that hour `24:00` is not recognised by pandas and replacing that with `00:00` would send the data point back to the start of the day.. – Andreuccio Jun 22 '17 at 14:58
  • I am not sure if understand, is possible create new question with sample dat, desired output and I think code is not necessary, only link to this question (answer). Thanks. – jezrael Jun 22 '17 at 15:01
  • Let me put it this way: how would your formula `dateparse = lambda x: pd.datetime.strptime(x.replace('24:','00:'), '%m/%d %H:%M:%S')` change if I was to impose multiple replacements, i.e. `('01:','00:'), ('02:','00:')...('24:','23:')` instead of one? – Andreuccio Jun 22 '17 at 15:59
  • Hmmm, good idea. I think the best is create `dict` and then replace by dict - uses [this](https://stackoverflow.com/a/2400569/2901002) – jezrael Jun 22 '17 at 16:01