identify if a row of table_1 belongs to a row of table_2 - Python

Question

Dataframe_1: time, travel_time and next_arrival_time is a time interval

0   1142922 32  2022-06-01 07:22:57.070 2022-06-01 07:32:33.149000
1   1142954 32  2022-06-01 07:49:07.285 2022-06-01 07:56:41.369000
2   1142978 32  2022-06-01 08:14:49.478 2022-06-01 08:22:47.571000
3   1143005 32  2022-06-01 08:36:41.684 2022-06-01 08:43:17.741579
4   1143025 32  2022-06-01 09:01:29.883 2022-06-01 09:07:34.852750
5   1143046 32  2022-06-01 09:28:12.099 2022-06-01 09:35:41.990302
6   1143070 32  2022-06-01 09:48:50.258 2022-06-01 10:25:31.174000
7   1143107 32  2022-06-01 10:35:41.243 2022-06-01 10:41:29.296000
8   1143125 32  2022-06-01 10:55:17.428 2022-06-01 11:06:51.553000
9   1143145 32  2022-06-01 11:25:57.681 2022-06-01 11:34:33.754000

Dataframe_2: the time in truckdate is every two seconds truckdate

0   2022-06-01 00:07:00.000000
1   2022-06-01 00:07:02.000000
2   2022-06-01 00:07:04.000000
3   2022-06-01 00:07:06.000000
4   2022-06-01 00:07:08.000000
5   2022-06-01 00:07:10.000000
6   2022-06-01 00:07:12.000000
7   2022-06-01 00:07:14.000000
8   2022-06-01 00:07:16.000000
9   2022-06-01 00:07:18.000000

I want to know if each element of the column "truncdate" (data_2) belongs to the interval "time_traveling" between "next_time_arrival" (data_1) and if it belongs and adds a column in data_2 with the "Id" of data_1.

I tried with this code but it doesn't work.

for i in data_2["truckdate"]:
    for n in data_1.index:
        while (i>=data_1['tiem_viajando'][n]) and (i<=data_1['next_tiem_llegada'][n]):
            data_2["Nuevo"][i]=data_1["id"][n]
            print(data_2)

Welcome to SO, Rafael Alvarado Nuñez! This is an above-average first question, nice job. Please provide your sample data by using `dput(.)`: it's more difficult scraping data from a question when the data has embedded spaces, and we don't *know* what the classes of each column is. For instance, are those timestamps `character` or `POSIXt`? See https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. Thank you! — r2evans, Jul 15 '22 at 14:20
If i understand your question ... do you want to make a join based on column value ? — Paulo Barbosa, Jul 15 '22 at 14:27
So, what I want is: if "truncdate" (data_2 ) belongs to the "time_traveling" interval and "next_time_arrival" (data_1) a column is added in data_2 with the "Id" of data_1 — Rafael Alvarado Nuñez, Jul 19 '22 at 16:15

score 0 · Accepted Answer · answered Jul 22 '22 at 13:24

Step1 - Generate the sample data

First we need to generate the above data to experiment on.

# Generating Data1 - 
# Copied Data1 from above sample in a csv file and read using pandas
>>> df1=pd.read_csv(csv_path)
>>> df1_mod = df1.copy()
>>> df1_mod['timestamp1'] = pd.to_datetime(df1_mod['timestamp1'])
>>> df1_mod['timestamp2'] = pd.to_datetime(df1_mod['timestamp2'])
>>> df1_mod
        id  id_equip          timestamp1          timestamp2
0  1142922        32 2022-01-06 07:22:57 2022-01-06 07:32:33
1  1142954        32 2022-01-06 07:49:07 2022-01-06 07:56:41
2  1142978        32 2022-01-06 08:14:49 2022-01-06 08:22:48
3  1143005        32 2022-01-06 08:36:42 2022-01-06 08:43:18
4  1143025        32 2022-01-06 09:01:30 2022-01-06 09:07:35
5  1143046        32 2022-01-06 09:28:12 2022-01-06 09:35:42
6  1143070        32 2022-01-06 09:48:50 2022-01-06 10:25:31
7  1143107        32 2022-01-06 10:35:41 2022-01-06 10:41:29
8  1143125        32 2022-01-06 10:55:17 2022-01-06 11:06:52
9  1143145        32 2022-01-06 11:25:58 2022-01-06 11:34:34

# Data2 can be generated as a date range using Pandas 
>>> df2 = pd.date_range(start='1/6/2022 11:05:00', periods=200, freq='2s')
>>> df2_mod = pd.DataFrame({'truckdate':df2})
>>> df2_mod
              truckdate
0   2022-01-06 11:05:00
1   2022-01-06 11:05:02
2   2022-01-06 11:05:04
3   2022-01-06 11:05:06
4   2022-01-06 11:05:08
..                  ...
195 2022-01-06 11:11:30
196 2022-01-06 11:11:32
197 2022-01-06 11:11:34
198 2022-01-06 11:11:36
199 2022-01-06 11:11:38

Step 2 - Datetime comparison function

Write a function to compare the truckdate from Data2 and timestamp1 and timestamp2 from Data1

def check_time(truckdate):
    # Compares truckdate with timestamp1 and timestamp2
    # Returns the id if matched
    for i, row in df1_mod.iterrows():
        start_time = row['timestamp1']
        end_time = row['timestamp2']
        id = row['id']
        if start_time < truckdate < end_time:
            return id

Step 3 - Result

Apply the function on each row of Data 2

>>> df2_mod['id']=df2_mod['truckdate'].apply(check_time)
>>> df2_mod
              truckdate         id
0   2022-01-06 11:05:00  1143125.0
1   2022-01-06 11:05:02  1143125.0
2   2022-01-06 11:05:04  1143125.0
3   2022-01-06 11:05:06  1143125.0
4   2022-01-06 11:05:08  1143125.0
..                  ...        ...
195 2022-01-06 11:11:30        NaN
196 2022-01-06 11:11:32        NaN
197 2022-01-06 11:11:34        NaN
198 2022-01-06 11:11:36        NaN
199 2022-01-06 11:11:38        NaN

Hope it helps !!

identify if a row of table_1 belongs to a row of table_2 - Python

1 Answers1

Step1 - Generate the sample data

Step 2 - Datetime comparison function

Step 3 - Result