7

I have a huge dataset of around 20gb. I have read the data using graphlab.SFrame.read_csv(). I have a date column which is read as string in the format yyyy-dd-mm. But i want the column to be read as a datetime object. How can I do it?

I understand that one way is to iterate through each row and change it using python code. Is there any other way? May be faster?

Dreams
  • 5,854
  • 9
  • 48
  • 71

2 Answers2

7

There's actually a built-in method for this in graphlab.SArray. Like Greg Whittier's answer, suppose your original date column is called datestring.

import graphlab
sf = graphlab.SFrame.read_csv('input.csv')
sf['datetime'] = sf['datestring'].str_to_datetime('%Y-%d-%m')
papayawarrior
  • 1,027
  • 7
  • 10
3
import graphlab
import datetime as dt
sf = graphlab.SFrame.read_csv('input.csv') # dates in datestring column
sf['datetime'] = sf['datestring'].apply(lambda x: dt.datetime.strptime(x, '%Y
-%d-%m'))
Greg Whittier
  • 3,105
  • 1
  • 19
  • 14