Here is the first dataframe:
df
open high low close volume tpv cum_volume cum_tpv vwap
timestamp
2021-02-04 04:20:00 1.41 1.41 1.40 1.40 1400.0 1967.000 1400.0 1.967000e+03 1.405000
2021-02-04 04:25:00 1.40 1.40 1.40 1.40 5510.0 7714.000 6910.0 9.681000e+03 1.401013
2021-02-04 04:30:00 1.40 2.70 1.40 2.70 8283.0 16980.150 15193.0 2.666115e+04 1.754831
2021-02-04 04:35:00 2.80 3.99 2.79 3.02 53546.0 181520.940 68739.0 2.081821e+05 3.028588
2021-02-04 04:40:00 3.14 3.15 2.37 2.38 51866.0 143150.160 120605.0 3.513322e+05 2.913082
... ... ... ... ... ... ... ... ... ...
2021-02-04 19:35:00 9.25 9.49 9.20 9.36 46380.0 433421.100 315556036.0 2.575511e+09 8.161819
2021-02-04 19:40:00 9.36 9.53 9.32 9.53 32975.0 310789.375 315589011.0 2.575822e+09 8.161951
2021-02-04 19:45:00 9.51 9.82 9.32 9.79 59682.0 571156.740 315648693.0 2.576393e+09 8.162218
2021-02-04 19:50:00 9.79 9.96 9.50 9.96 46546.0 452892.580 315695239.0 2.576846e+09 8.162449
2021-02-04 19:55:00 9.85 10.09 9.76 10.09 89898.0 892237.650 315785137.0 2.577738e+09 8.162951
[188 rows x 9 columns]
Here is the second dataframe:
df2
SMA
timestamp
2021-02-04 04:40:00 2.0635
2021-02-04 04:41:00 2.1355
2021-02-04 04:42:00 2.1970
2021-02-04 04:43:00 2.2590
2021-02-04 04:44:00 2.3080
... ...
2021-02-04 18:56:00 10.2080
2021-02-04 18:57:00 10.1705
2021-02-04 18:58:00 10.1205
2021-02-04 18:59:00 10.0665
2021-02-04 19:00:00 10.0210
[861 rows x 1 columns]
I want to add the SMA column from df2 to df only if the index exist in both df and df2. Otherwise, place the SMA value to nan in df. The indexes from df will never be deleted. How can I do that?
I have tried this line merged_df = pd.merge(df, df2, left_index=True, right_index=True, how='inner')
, but I got this result:
open high low close volume tpv cum_volume cum_tpv vwap SMA
timestamp
2021-02-04 04:40:00 3.14 3.15 2.37 2.38 51866.0 143150.160 120605.0 3.513322e+05 2.913082 2.0635
2021-02-04 04:45:00 2.50 2.70 2.50 2.63 24269.0 63099.400 144874.0 4.144317e+05 2.860635 2.3655
2021-02-04 04:50:00 2.56 2.58 2.42 2.57 11849.0 29622.500 156723.0 4.440542e+05 2.833369 2.6650
2021-02-04 04:55:00 2.57 2.58 2.42 2.50 16916.0 42290.000 173639.0 4.863442e+05 2.800892 2.7280
2021-02-04 05:00:00 2.54 2.60 2.52 2.54 10899.0 27901.440 184538.0 5.142456e+05 2.786665 2.5665
... ... ... ... ... ... ... ... ... ... ...
2021-02-04 18:40:00 10.89 10.97 10.10 10.33 105233.0 1108629.655 314656226.0 2.566818e+09 8.157530 11.4100
2021-02-04 18:45:00 10.30 10.47 9.50 9.74 150192.0 1499667.120 314806418.0 2.568317e+09 8.158402 11.0620
2021-02-04 18:50:00 9.64 9.95 9.44 9.78 158926.0 1540787.570 314965344.0 2.569858e+09 8.159177 10.5835
2021-02-04 18:55:00 9.80 10.12 9.65 9.80 91473.0 904210.605 315056817.0 2.570762e+09 8.159678 10.2565
2021-02-04 19:00:00 9.80 9.86 9.60 9.81 67126.0 653135.980 315123943.0 2.571415e+09 8.160013 10.0210
As you can see it misses at least from the beginning of df. This solution does not work.