0

I have a dataframe in vaex that I'm having trouble with the timestamp format.

I can't seem to correct format the timestamp column. After researching the problem, I have come to the conclusion that I need to remove the colon in the UTC offset (00:00). But I don't know how to do this.

  • note: I didn't know how to make a sample vaex dataframe, so I made a pandas one and converted it

from datetime import datetime

# Sample pandas dataframe
df = pd.DataFrame({'quantity' : [0,6,4,3,7,8,3,2],
                   'timestamp' : ['2018-05-01 03:05:00+00:00', '2018-05-01 04:15:00+00:00',
                                '2018-06-02 03:15:00+00:00', '2018-06-02 04:25:00+00:00',
                                '2018-07-03 03:25:00+00:00', '2018-07-03 04:35:00+00:00',
                                '2018-08-04 03:35:00+00:00', '2018-08-04 04:45:00+00:00'],
                   'id' : [1,2,3,4,5,6,7,8] 
                   })
# Pandas to vaex
df = vx.from_pandas(df=df, copy_index=True)

# converting string timestamp to datetime
date_format = "%Y-%m-%d %H:%M:%S %z" 

def column_to_datetime(datetime_str):
    return np.datetime64(datetime.strptime(datetime_str, date_format))

df['timestamp_parsed'] = df['timestamp'].apply(column_to_datetime)

but later on, I get the error ValueError: time data '2018-05-01 03:05:00+00:00' does not match format '%Y-%m-%d %H:%M:%S %z'

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
Rebecca James
  • 383
  • 2
  • 12
  • 2
    see: [What is Vaex function to parse string to datetime64, which equivalent to pandas to_datetime, that allow custom format?](https://stackoverflow.com/q/64424547/10197418), but use `"%Y-%m-%d %H:%M:%S%z"` as parsing directive (no space between seconds and UTC offset). – FObersteiner Dec 08 '22 at 15:19
  • 1
    see also: [Python Vaex data type conversion: string to datetime](https://stackoverflow.com/q/62648122/10197418) – FObersteiner Dec 09 '22 at 07:39

2 Answers2

2

you can use:

from datetime import datetime

# Sample pandas dataframe
df = pd.DataFrame({'quantity' : [0,6,4,3,7,8,3,2],
                   'timestamp' : ['2018-05-01 03:05:00+00:00', '2018-05-01 04:15:00+00:00',
                                '2018-06-02 03:15:00+00:00', '2018-06-02 04:25:00+00:00',
                                '2018-07-03 03:25:00+00:00', '2018-07-03 04:35:00+00:00',
                                '2018-08-04 03:35:00+00:00', '2018-08-04 04:45:00+00:00'],
                   'id' : [1,2,3,4,5,6,7,8] 
                   })
df['timestamp']=pd.to_datetime(df['timestamp']).dt.tz_localize(None)

# Pandas to vaex
df = vx.from_pandas(df=df, copy_index=True)

Output:

    timestamp
0   2018-05-01 03:05:00
1   2018-05-01 04:15:00
2   2018-06-02 03:15:00
3   2018-06-02 04:25:00
4   2018-07-03 03:25:00
5   2018-07-03 04:35:00
6   2018-08-04 03:35:00
7   2018-08-04 04:45:00


Bushmaster
  • 4,196
  • 3
  • 8
  • 28
  • 2
    I don't think this will work; OP does not have a pandas dataframe initially. See the question I linked in the comments section on how to do it without pd.to_datetime. – FObersteiner Dec 08 '22 at 15:21
0

Using your code, try this before converting the string to datetime:

# Remove the colon from the UTC offset
df['timestamp'] = df['timestamp'].str.replace(':', '')
Jorge Espinar
  • 146
  • 1
  • 5