1

I have a pandas data frame where I have a column with arrays. Python parsed it like strings. How can I change the column to list type or this particular cell to array?

I read about ast.literal_eval but no timestamp type is allowed. I also tried json.loads but there was an error Expecting value: line 1 column 2 (char 1)

Example

df = pd.read_csv('data.csv')

df.head(5)
id     date_time
1      [Timestamp('2023-01-01 13:42:55'), Timestamp('2023-01-01 13:55:19')]
2      [Timestamp('2023-01-01 13:55:23')]
3      [Timestamp('2023-01-01 13:55:33')]
4      [Timestamp('2023-01-01 14:52:33'),Timestamp('2023-01-01 15:01:33')]
5      [Timestamp('2023-01-01 14:52:33'),Timestamp('2023-01-01 15:01:33'), Timestamp('2023-01-01 15:20:33')]

a = df.date_time[1]
print(a)
print(type(a))

[Timestamp('2023-01-01 13:42:55'), Timestamp('2023-01-01 13:55:19')]
'str'

What is the goal - I need to go through these lists in columns and check the time between these timestamps, so that the time is between the times and whether it satisfies the condition, for example, between them 1 minute and no more.

1 Answers1

0

Here an example of what you want to achieve

from datetime import datetime
import pandas as pd

df = pd.DataFrame({'id':1, 
                  'date_time': ["Timestamp('2023-01-01 13:42:55'), Timestamp('2023-01-01 13:55:19'), Timestamp('2023-01-01 13:55:23'), Timestamp('2023-01-01 13:55:33')"]})

# Convert the string to a list split by the comma
df['date_time'] = df['date_time'].str.split(',')
# strip the white space and convert to datetime
df['date_time'] = df['date_time'].apply(lambda x: [i.strip() for i in x])
df['date_time'] = df['date_time'].apply(lambda x: [datetime.strptime(i, "Timestamp('%Y-%m-%d %H:%M:%S')") for i in x])
# Calculate the difference between the timestamps
df['date_time'] = df['date_time'].apply(lambda x: [x[i+1] - x[i] for i in range(len(x)-1)])
print(df)
Pepe N O
  • 1,678
  • 1
  • 7
  • 11