0

I have two DataFrames with a DatetimeIndex. I am copying a column (or more) from df1 into df2, on matching indexes. See the example below.

Example

df1

                     some_col
datetime                     
2021-01-01 00:10:00  0.906362
2021-01-01 00:11:00  0.361477
2021-01-01 00:12:00  0.007753
2021-01-01 00:13:00  0.639844
2021-01-01 00:14:00  0.885964
...                       ...
2022-11-26 10:45:00  0.539110
2022-11-26 10:46:00  0.177116
2022-11-26 10:47:00  0.580149
2022-11-26 10:48:00  0.600307
2022-11-26 10:49:00  0.449839
[1000000 rows x 1 columns]

df2

                         to_copy
datetime                        
2021-01-01 00:00:00  6383.498664
2021-01-01 00:05:00  6252.822505
2021-01-01 00:10:00  5553.735088
2021-01-01 00:15:00   474.491263
2021-01-01 00:20:00  7758.722618
...                          ...
2022-11-26 18:35:00   542.154115
2022-11-26 18:40:00  2031.866841
2022-11-26 18:45:00  1828.248310
2022-11-26 18:50:00  2748.979463
2022-11-26 18:55:00  4629.085881
[200100 rows x 1 columns]

Final df

                     some_col      to_copy
datetime                                  
2021-01-01 00:10:00  0.906362  5553.735088
2021-01-01 00:11:00  0.361477          NaN
2021-01-01 00:12:00  0.007753          NaN
2021-01-01 00:13:00  0.639844          NaN
2021-01-01 00:14:00  0.885964          NaN
...                       ...          ...
2022-11-26 10:45:00  0.539110   718.218622
2022-11-26 10:46:00  0.177116          NaN
2022-11-26 10:47:00  0.580149          NaN
2022-11-26 10:48:00  0.600307          NaN
2022-11-26 10:49:00  0.449839          NaN
[1000000 rows x 2 columns]

My code

The way I currently copy the column/columns is the following:

df1.loc[np.intersect1d(df2.index.values, df1.index.values), "to_copy"] = df2["to_copy"]
df1["to_copy"] = df1["to_copy"].shift(interval - 1)

The problem

This line of code is currently a performance bottleneck in my program. When profiling the code, I noticed the slowest part is pandas indexing in loc. Is there a faster way of copying columns on matching indexes?


Recreate example

from datetime import datetime, timedelta
import numpy as np
import pandas as pd

# Dfs generation
rows_df1 = 1000000
df2_interval = 5  # min
rows_df2 = 200100

df1 = pd.DataFrame(
    {
        "datetime": [datetime(2021, 1, 1, 0, 10) + timedelta(minutes=i) for i in range(rows_df1)],
        "some_col": np.random.rand(rows_df1),
    }
).set_index("datetime")
df2 = pd.DataFrame(
    {
        "datetime": [datetime(2021, 1, 1, 0, 0) + timedelta(minutes=i * df2_interval) for i in range(rows_df2)],
        "to_copy": np.random.rand(rows_df2),
    }
).set_index("datetime")

# Copying a column on matching indexes
intersection = np.intersect1d(df2.index.values, df1.index.values)
df1.loc[intersection, "to_copy"] = df2.loc[intersection, "to_copy"]

print(df1)
janboro
  • 79
  • 7

0 Answers0