0

So I have used pandas to open up my data which is in .csv format. I'd now like to reformat the dates from its current dd/mm/YYYY hh:mm:ss format to purely YYYY-mm-dd format, e.g., from 19/11/2014 15:26:13 to 2014-11-19. How would I go about doing this within a pandas data array? i.e., converting from this:

          Id  User Id           Start Time             End Time  Climb Time
0      74618    27366  19/11/2014 15:26:13  19/11/2014 15:26:18           5   
1      74632    27366  19/11/2014 15:26:18  19/11/2014 15:42:26         968   
2      74633    27366  19/11/2014 15:42:26  19/11/2014 15:42:48          22

To this:

          Id  User Id  Start Time    End Time  Climb Time
0      74618    27366  2014-11-19  2014-11-19           5   
1      74632    27366  2014-11-19  2014-11-19         968   
2      74633    27366  2014-11-19  2014-11-19          22

I've tried another of pandas ready methods, but none have seemed to work or recognise the initial data. I was wondering if anyone knew of any methods which could achieve this...

GCien
  • 2,221
  • 6
  • 30
  • 56
  • Sorry are you asking how to parse your date strings correctly or you asking about reformatting here, for the former `df['Start Time'] = pd.to_datetime(df['Start Time'])` will work for the latter, after conversion you can do `df['Start Time'].dt.strftime('%Y-%m-%d')` however, this will give you strings and not a date or datetime object which is not as useful as a proper datetime IMO – EdChum Nov 21 '16 at 20:37
  • @reticentroot That example is for `to.csv` not `from.csv`. @EdChum I'd take whatever method that works - what is `df`? I'd like this to be explain more explicitly than just "do this - it works" - I'd like to know what is happening. – GCien Nov 21 '16 at 20:47
  • @EdChum That also gives me: `TypeError: 'type' object has no attribute '__getitem__'` – GCien Nov 21 '16 at 20:49
  • here `df` refers to a dataframe, I'm assuming you created a df using `read_csv` like so: `df = pd.read_csv(your_file_path)`, you can in fact tell pandas to parse those columns: `df = pd.read_csv(your_file, parse_dates=['Start Time', 'End Time'])` also you didn't state clear what your current state of play is, your question asked how to do this within a pandas data array, this would imply it's already been loaded into a dataframe, if not then you should post raw data, your code and attempts to clear up any ambiguity – EdChum Nov 21 '16 at 20:54
  • @EdChum Ok, this is starting to make sense. What is `dt`? – GCien Nov 21 '16 at 20:57
  • @EdChum Also - do I need to use `from pandas import DataFrame as df`? – GCien Nov 21 '16 at 20:58
  • After parsing the columns into `datetime` you can access various `datetime` properties using `.dt`: http://pandas.pydata.org/pandas-docs/stable/api.html#datetimelike-properties – EdChum Nov 21 '16 at 20:58
  • No, `df` is your variable name, not an alias – EdChum Nov 21 '16 at 20:59
  • @EdChum And I also get `AttributeError: 'DatetimeProperties' object has no attribute 'strftime'`... – GCien Nov 21 '16 at 21:00
  • Please update your question with your new updated information, code, errors and also pandas version, it should work if you're using a recent version: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.strftime.html#pandas.Series.dt.strftime – EdChum Nov 21 '16 at 21:01
  • Updated my pandas version to 0.19.1 and it is working fine. Thank you. It hasn't dropped the time suffix, however - but this is a minor flaw. – GCien Nov 21 '16 at 21:06
  • You can do `dt.normalize` to set the times to `00:00:00` and by default pandas won't show the time when it's set to this: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.normalize.html – EdChum Nov 21 '16 at 21:17
  • @EdChum So how would I now convert this date to something useful that I could plot in matplotlib, for example? – GCien Nov 21 '16 at 21:40

2 Answers2

1

Try this:

pd.to_datetime(df['Start Time']).dt.date

or this if you want your column type to be string:

pd.to_datetime(df.StartTime).dt.strftime('%Y-%m-%d')
burhan
  • 924
  • 4
  • 11
  • Top method works - but passing using `df = df.sort('End Time')` first then `df['End Time'] = pandas.to_datetime(df['End Time']).dt.date` the `pyplot.plot(['End Time'],['Calories'])` spits out `ValueError: could not convert string to float: End Time`...so matplotlib does not like this format. I need to be able to sort it and use it through `matplotlib`. – GCien Nov 21 '16 at 21:51
  • This is a different problem, can you update your original question and add this or create a new question? – burhan Nov 21 '16 at 21:55
1

Let me give this a shot and see if it is what you are looking for.

Input CSV is like this (I called it test.csv for this example)

Id,User Id,Start Time,End Time,Climb Time
74618,27366,19/11/2014 15:26:13,19/11/2014 15:26:18,5   
74632,27366,19/11/2014 15:26:18,19/11/2014 15:42:26,968   
74633,27366,19/11/2014 15:42:26,19/11/2014 15:42:48,22

Now some input/output from iPython.

In [1]: import pandas as pd

In [2]: df = pd.read_csv('test.csv')

We can have a look at the conversion to date-time. Note the "type" we get at the end.

In [5]: pd.to_datetime(df['Start Time'])
Out[5]: 
0   2014-11-19 15:26:13
1   2014-11-19 15:26:18
2   2014-11-19 15:42:26
Name: Start Time, dtype: datetime64[ns]

Now you can save this back to a new column. to_datetime is pretty clever in terms of getting the dates and time parts. If it gets your dates wrong, you can specify the format for the input date-time using the format key-word

df['start_dt'] = pd.to_datetime(df['Start Time'])

Finally, you can check the dtypes for the columns in the Data Frame:

In [9]: df.dtypes
Out[9]: 
Id                     int64
User Id                int64
Start Time            object
End Time              object
Climb Time             int64
start_dt      datetime64[ns]
dtype: object

Now this new column, it is a datetime object and you can display or sort it in any way you might like to do that. From your OP, it seems you want to just get the date part. That is easy, you can do this (from this post)

In [10]: df['start_dt'].dt.date
Out[10]: 
0    2014-11-19
1    2014-11-19
2    2014-11-19
Name: start_dt, dtype: object

Now we can put this all together and do the conversion and then put it as a date in one line. It seems you want to recycle the column name, so I am doing that here, but it is not necessary. You can give the 'new' column any name you wish.

In [13]: df['Start Time'] = pd.to_datetime(df['Start Time']).dt.date

In [14]: df['End Time'] = pd.to_datetime(df['End Time']).dt.date

In [15]: df
Out[15]: 
      Id  User Id  Start Time    End Time  Climb Time    start_dt
0  74618    27366  2014-11-19  2014-11-19           5  2014-11-19
1  74632    27366  2014-11-19  2014-11-19         968  2014-11-19
2  74633    27366  2014-11-19  2014-11-19          22  2014-11-19

I think the result is what you are looking for.

Community
  • 1
  • 1