3

I need help converting into python/pandas date time format. For example, my times are saved like the following line:

2017-01-01 05:30:24.468911+00:00
.....
2017-05-05 01:51:31.351718+00:00

and I want to know the simplest way to convert this into date time format for essentially performing operations with time (like what is the range in days of my dataset to split up my dataset into chunks by time, what's the time difference from one time to another)? I don't mind losing some of the significance for the times if that makes things easier. Thank you so much!

Jane Sully
  • 3,137
  • 10
  • 48
  • 87
  • 1
    Are you saving those as strings? If so, you can use `strptime()` to break out the time indicated into an object and do operations on members of those objects. The members of the objects will be determined by the formatting string you pass into the method. See [this](https://docs.python.org/3.6/library/time.html#time.strptime) documentation for more information. – nerdenator Jun 06 '17 at 18:31
  • Yes, they are saved as strings. Sorry for not specifying. I'll try strptime(). Thanks! – Jane Sully Jun 06 '17 at 19:20

3 Answers3

2

Let's say I have two strings 2017-06-06 and 1944-06-06 and I wanted to get the difference (what Python calls a timedelta) between the two.

First, I'll need to import datetime. Then I'll need to get both of those strings into datetime objects:

>>> a = datetime.datetime.strptime('2017-06-06', '%Y-%m-%d')
>>> b = datetime.datetime.strptime('1944-06-06', '%Y-%m-%d')

That will give us two datetime objects that can be used in arithmetic functions that will return a timedelta object:

>>> c = abs((a-b).days)

This will give us 26663, and days is the largest resolution that timedelta supports: documentation

nerdenator
  • 1,265
  • 2
  • 18
  • 35
2

Timestamp will convert it for you.

>>> pd.Timestamp('2017-01-01 05:30:24.468911+00:00')
Timestamp('2017-01-01 05:30:24.468911+0000', tz='UTC')

Let's say you have a dataframe that includes your timestamp column (let's call it stamp). You can use apply on that column together with Timestamp:

df = pd.DataFrame(
    {'stamp': ['2017-01-01 05:30:24.468911+00:00',
               '2017-05-05 01:51:31.351718+00:00']})
>>> df
    stamp
0   2017-01-01 05:30:24.468911+00:00
1   2017-05-05 01:51:31.351718+00:00

>>> df['stamp'].apply(pd.Timestamp)
0   2017-01-01 05:30:24.468911+00:00
1   2017-05-05 01:51:31.351718+00:00
Name: stamp, dtype: datetime64[ns, UTC]

You could also use Timeseries:

>>> pd.TimeSeries(df.stamp)
0    2017-01-01 05:30:24.468911+00:00
1    2017-05-05 01:51:31.351718+00:00
Name: stamp, dtype: object

Once you have a Timestamp object, it is pretty efficient to manipulate. You can just difference their values, for example.

You may also want to have a look at this SO answer which discusses timezone unaware values to aware.

Alexander
  • 105,104
  • 32
  • 201
  • 196
1

Since the Pandas tag is there:

df = pd.DataFrame(['2017-01-01 05:30:24.468911+00:00'])
df.columns = ['Datetime']
df['Datetime'] = pd.to_datetime(df['Datetime'], format='%Y-%m-%d %H:%M:%S.%f', utc=True)
print(df.dtypes)
misantroop
  • 2,276
  • 1
  • 16
  • 24