0

I have two separate DataFrames, which both contain rainfall amounts and dates corresponding to them.

df1:

                 time     tp
0    2013-01-01 00:00:00  0.0
1    2013-01-01 01:00:00  0.0
2    2013-01-01 02:00:00  0.0
3    2013-01-01 03:00:00  0.0
4    2013-01-01 04:00:00  0.0
                 ...  ...
8755 2013-12-31 19:00:00  0.0
8756 2013-12-31 20:00:00  0.0
8757 2013-12-31 21:00:00  0.0
8758 2013-12-31 22:00:00  0.0
8759 2013-12-31 23:00:00  0.0

[8760 rows x 2 columns]

df2:

                 time         tp
0     2013-07-18T18:00:01  0.002794
1     2013-07-18T20:00:00  0.002794
2     2013-07-18T21:00:00  0.002794
3     2013-07-18T22:00:00  0.002794
4     2013-07-19T00:00:00  0.000000
                  ...       ...
9656  2013-12-30T13:30:00  0.000000
9657  2013-12-30T23:30:00  0.000000
9658  2013-12-31T00:00:00  0.000000
9659  2013-12-31T00:00:00  0.000000
9660  2014-01-01T00:00:00  0.000000

[9661 rows x 2 columns]

I'm trying to plot a scatter graph comparing the two data frames. The way I'm doing it is by choosing a specific date and time and plotting the df1 tp on one axis and df2 tp on the other axis.

For example,

If the date/time on both dataframes = 2013-12-31 19:00:00, then plot tp for df1 onto x-axis, and tp for df2 on the y-axis.

To solve this, I tried using the following:

df1['dates_match'] = np.where(df1['time'] == df2['time'], 'True', 'False')

which will tell me if the dates match, and if they do I can plot. The problem arises as I have a different number of rows on each dataframe, and most methods only allow comparison of dataframes with exactly the same amount of rows.

Does anyone know of an alternative method I could use to plot the graph?

Thanks in advance!

jw99
  • 41
  • 6

1 Answers1

0

The main goal is to plot two time series with that apparently don't have the same frequency to be able to compare them.

Since the main issue here is the different timestamps let's tackle that with pandas resample so we have a more uniform timestamps for each observation. To take the sum of 30 minutes intervals you can do (feel free to change the time interval and the agg function if you want to)

df1.set_index("time", inplace=True)
df2.set_index("time", inplace=True)

df1_resampled = df1.resample("30T").sum() # taking the sum of 30 minutes intervals
df2_resampled = df2.resample("30T").sum() # taking the sum of 30 minutes intervals

Now that the timestamps are more organized you can either merge the newer resampled dataframes if you want to and then plot i

df_joined = df1_resampled.join(df2_resampled, lsuffix="_1", rsuffix="_2")
df_joined.plot(marker="o", figsize=(12,6))
# df_joined.plot(subplots=True) if you want to plot them separately

Since df1 starts on 2013-01-01 and df2 on 2013-07-18 you'll have a first period where only df1 will exist if you want to plot only the overlapped period you can pass how="outer" to when joining both dataframes.

Eduardo Pacheco
  • 274
  • 1
  • 6