1

So, I want to calculate the time differences; and the file looks something like this

id message_id send_date status
0 5f74b996a2b7e 2020-10-01 00:00:07 sent
1 5f74b996a2b7e 2020-10-01 00:00:09 delivered
2 5f74b99e85b3c 2020-10-02 02:00:14 sent
3 5f74b99e85b3c 2020-10-02 02:01:16 delivered
4 5f74b99e85b3c 2020-10-02 08:06:49 read
5 5f74b996a2b7e 2020-10-02 15:16:32 read
6 5f9d97ff1af9e 2020-10-14 13:45:43 sent
7 5f9d97ff1af9e 2020-10-14 13:45:45 delivered
8 5f9d97ff1af9e 2020-10-14 13:50:48 read
9 5f74b9a35b6c5 2020-10-16 19:01:19 sent
10 5f74b9a35b6c5 2020-10-16 19:01:25 deleted

Inside is id which increment, message_id is unique to each message, send_date is the time, status is the message status (it has 5 statuses; sent, delivered, read, failed, and deleted).

I wanted to calculate the time differences when the message was sent then delivered, if delivered then read.

I know something like this can be handy, but I wasn't sure how to assign it uniquely to each of the message_id

from datetime import datetime
s1 = '2020-10-14 13:45:45'
s2 = '2020-10-14 13:50:48' # for example
FMT = '%Y-%m-%d %H:%M:%S'
tdelta = datetime.strptime(s2, FMT) - datetime.strptime(s1, FMT)
print(tdelta)

Ref: https://stackoverflow.com/questions/3096953/how-to-calculate-the-time-interval-between-two-time-strings

The expected output would be,

message_id delivered_diff read_diff deleted_diff
0 5f74b996a2b7e 00:00:02 1 day, 15:16:23
1 5f74b99e85b3c 00:01:02 6:05:33
2 5f9d97ff1af9e 00:00:02 0:05:03
3 5f74b9a35b6c5 0:00:06
sadil
  • 19
  • 5

2 Answers2

0

You can use pandas to do this and datetime. The code is commented to better understand and realized with python 3.8.

import datetime
import pandas as pd

def time_delta(a, b):
    return datetime.datetime.strptime(b, '%Y-%m-%d %H:%M:%S') - datetime.datetime.strptime(a, '%Y-%m-%d %H:%M:%S') # calculate the timedelta

def calculate_diff(val, first_status, second_status):
    if not val['status'].str.contains(first_status).any() or not val['status'].str.contains(second_status).any(): # Check if the status exist
        return ''

    a = val.loc[val['status'] == first_status, 'send_date'].values[0] # Get the first send_date value for the first status value
    b = val.loc[val['status'] == second_status, 'send_date'].values[0] # Get the first send_date value for the second status value

    return time_delta(a, b) # calculate the delta

df = pd.read_csv('test.csv', sep=';') # Load csv file with ; as separator
grouped = df.groupby('message_id') # Group by message ids

final_df = pd.DataFrame(columns=['message_id', 'delivered_diff', 'read_diff', 'deleted_diff']) # create empty result dataframe
for message_id, values in grouped: # calculate the results for each group
    delivered_diff = calculate_diff(values, 'sent', 'delivered') # calculate delivered_diff as delta between sent status and delivered status
    read_diff = calculate_diff(values, 'delivered', 'read') # calculate read_diff as delta between delivered status and read status
    deleted_diff = calculate_diff(values, 'sent', 'deleted') # calculate deleted_diff as delta between sent status and deleted status

    res = {
        'message_id': message_id,
        'delivered_diff': delivered_diff,
        'read_diff': read_diff,
        'deleted_diff': deleted_diff
    }
    # append the results
    final_df = final_df.append(res, ignore_index=True)

# print final result
print(final_df)

The result:

      message_id   delivered_diff        read_diff     deleted_diff
0  5f74b996a2b7e  0 days 00:00:02  1 days 15:16:23
1  5f74b99e85b3c  0 days 00:01:02  0 days 06:05:33
2  5f74b9a35b6c5                                    0 days 00:00:06
3  5f9d97ff1af9e  0 days 00:00:02  0 days 00:05:03
Carlo Zanocco
  • 1,967
  • 4
  • 18
  • 31
  • This works so great! Thanks a bunch!! (Y) – sadil Mar 20 '21 at 10:56
  • Hi @RahmanArmenzaria if this or any answer has solved your question please consider accepting it by clicking the check-mark. This indicates to the wider community that you've found a solution and gives some reputation to both the answerer and yourself. There is no obligation to do this, also take a look to [this](https://meta.stackexchange.com/a/5235/315993) for more details. – Carlo Zanocco Mar 20 '21 at 10:56
0
import pandas as pd
from datetime import datetime, timedelta

final_dict = []
data = pd.read_csv('data.csv', names=['id','unique_id','time','status'])
data['time'] = pd.to_datetime(data['time'])
# data.info()
groupByUniqueId = data.groupby('unique_id') 
for name,group in groupByUniqueId:
  for row in group.iterrows():
        
    if row[1][3] == "sent":
      sent = row[1][2]
    if row[1][3] == "read":
      final_dict.append({row[1][1]: {"read": str(sent - row[1][2])}})
    elif row[1][3] == "delivered":
      final_dict.append({row[1][1]: {"delivered":str(sent - row[1][2])}})
    elif row[1][3] == "deleted":
      final_dict.append({row[1][1]: {"deleted":str(sent - row[1][2])}})

print(final_dict)

Data Sample for CSV

enter image description here