4

I try to parse a CSV file which looks like this:

dd.mm.yyyy   value

01.01.2000   1
02.01.2000   2
01.02.2000   3

I need to add missing dates and fill according values with NaN. I used Series.reindex like in this question:

import pandas as pd

ts=pd.read_csv(file, sep=';', parse_dates='True', index_col=0)

idx = pd.date_range('01.01.2000', '02.01.2000')

ts.index = pd.DatetimeIndex(ts.index)
ts = ts.reindex(idx, fill_value='NaN')

But in result, values for certain dates are swapped due to date format (i.e. mm/dd instead of dd/mm):

01.01.2000   1
02.01.2000   3
03.01.2000   NaN
...
...
31.01.2000   NaN
01.02.2000   2

I tried several ways (i.e. add dayfirst=True to read_csv) to do it right but still can't figure it out. Please, help.

John Smith
  • 53
  • 1
  • 1
  • 5

1 Answers1

5

Set parse_dates to the first column with parse_dates=[0]:

ts = pd.read_csv(file, sep=';', parse_dates=[0], index_col=0, dayfirst=True)

idx = pd.date_range('01.01.2000', '02.01.2000')

ts.index = pd.DatetimeIndex(ts.index)
ts = ts.reindex(idx, fill_value='NaN')
print(ts)

prints:

              value
2000-01-01        1
2000-01-02        2
2000-01-03      NaN
...
2000-01-31      NaN
2000-02-01        3

parse_dates=[0] tells pandas to explicitly parse the first column as dates. From the docs:

parse_dates : boolean, list of ints or names, list of lists, or dict

If True -> try parsing the index.

If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.

If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.

{'foo' : [1, 3]} -> parse columns 1, 3 as date and call result 'foo'

A fast-path exists for iso8601-formatted dates.

Community
  • 1
  • 1
Mike Müller
  • 82,630
  • 20
  • 166
  • 161
  • @MikeMüller, Trying to use this but I have multiple entries per date, it's absence records from factory shifts. So, we could have 2 records of people calling in sick on 8/5/2015 but then no more sick days till 8/9/2015 when we could have 3. My code breaks at the line: ts = ts.reindex(idx, fill_value='NaN') Can you answer that in this question or should I create a new one? Thank you! – Programming_Learner_DK Aug 03 '17 at 17:51
  • Go ahead and create a new question. Please and an example of what you try to achieve. – Mike Müller Aug 03 '17 at 18:22
  • @MikeMüller, question posted here: https://stackoverflow.com/questions/45506038/pandas-reindex-to-fill-missing-dates-or-better-method-to-fill Thank you for any insight you can provide! – Programming_Learner_DK Aug 04 '17 at 11:47