3

I have a rather odd time format and the data frame df with the column TIMESTAMP:

2016-10-25T09:34:52.051713+01:00
2016-10-25T09:46:14.051620+01:00
2016-10-25T09:51:16.052435+01:00
....

but I need to use the data as time information. In the first place, I would like to get rid of the last 13 characters such that it looks like

2016-10-25T09:34:52
2016-10-25T09:46:14
2016-10-25T09:51:16
....

To do so, I tried

df['TIMESTAMP'] = df.apply(lambda x: x['TIMESTAMP'][:-13], axis = 1)

from Remove ends of string entries in pandas DataFrame column

but I receive the error:

TypeError: string indices must be integers

which I do not understand. I'm just fresh in python but I don't see much I've done wrong?

gehbiszumeis
  • 3,525
  • 4
  • 24
  • 41
Ben
  • 1,432
  • 4
  • 20
  • 43
  • 1
    When you remove the last part, you might get into trouble when DST comes into play. This will change the timezone and you might have inconsistent data after your operations. You might consider to, instead of trimming the timestamp, to correct it and convert it into UTC. – kvantour Jun 12 '19 at 07:57
  • thanks for the hint! What is DST? – Ben Jun 12 '19 at 08:01
  • 1
    DST stands for [_Daylight Saving Time_](https://en.wikipedia.org/wiki/Daylight_saving_time). – kvantour Jun 12 '19 at 08:43

5 Answers5

3

The strict answer to your question is to use the str accessor, which can take a slice, just like a normal string, and which will apply that slice to every value in the Series:

data = ['2016-10-25T09:34:52.051713+01:00',
        '2016-10-25T09:46:14.051620+01:00',
        '2016-10-25T09:51:16.052435+01:00']

s = pd.Series(data)

print(s.str[:-13])

Output:

0    2016-10-25T09:34:52
1    2016-10-25T09:46:14
2    2016-10-25T09:51:16

However, I think what you want is actually pd.to_datetime, which will (within reason) infer the format of your data and convert it into datetime objects:

print(pd.to_datetime(s))

Output:

0   2016-10-25 09:34:52.051713+01:00
1   2016-10-25 09:46:14.051620+01:00
2   2016-10-25 09:51:16.052435+01:00
dtype: datetime64[ns, pytz.FixedOffset(60)]
gmds
  • 19,325
  • 4
  • 32
  • 58
1

A simple way to do this is to use a list comprehension :

df = pd.DataFrame({'TIMESTAMP' : ["2016-10-25T09:34:52.051713+01:00", "2016-10-25T09:46:14.051620+01:00"]})

 TIMESTAMP
0  2016-10-25T09:34:52.051713+01:00
1  2016-10-25T09:46:14.051620+01:00

df['TIMESTAMP'] = [x[:-13] for x in df['TIMESTAMP']]

Output :

TIMESTAMP
0  2016-10-25T09:34:52
1  2016-10-25T09:46:14
vlemaistre
  • 3,301
  • 13
  • 30
1

I recommend to always use datetime package if you deal with any kind of dates. As of Python3.7, functions fromisoformat can digest your ISO format directly, while strftime can convert to any thinkable time format. (For Python3.6 or lower see this answer).

In your case with timestamps written in df['TIMESTAMP']:

from datetime import datetime as dt
df['TIMESTAMP'].apply(lambda x: dt.strftime(dt.fromisoformat(x), '%Y-%m-%dT%H:%M:%S'))

will do the trick and give your desired output:

             TIMESTAMP
0  2016-10-25T09:34:52
1  2016-10-25T09:46:14
2  2016-10-25T09:51:16
gehbiszumeis
  • 3,525
  • 4
  • 24
  • 41
0

Please replace the transformation statement with below code

df['TIMESTAMP'] = df['TIMESTAMP'].apply(lambda x: x[-4:])

This will apply the transformation function to the entire column named time stamp and save it back with the same column name

Asnim P Ansari
  • 1,932
  • 1
  • 18
  • 41
  • Thank you! The last bracket is too much. But I receive the error: "KeyError: 'TIMESTAMP'. What does that mean? – Ben Jun 12 '19 at 07:28
0
df = pd.DataFrame(
    data={
        'TIMESTAMP': [
            '2016-10-25T09:34:52.051713+01:00',
            '2016-10-25T09:46:14.051620+01:00',
            '2016-10-25T09:51:16.052435+01:00'
        ]
    }
)
df['TIMESTAMP'] = df['TIMESTAMP'].apply(lambda x: x[:19])

BTW, are you sure you don't need to keep the information of timezone?

Andrew Li
  • 539
  • 5
  • 11