1

I have data for rain measurements and water level measurements. But with different date and time values. Let's say I want to compare the data by visualizing it in a subplot figure at the exact same time. I have tried to do it myself with to diffent dataframes, as seen in the figure: Rain and water level measurements

As seen, the time is shifted in both figures, making it hard to compare the "peaks" according to the same time.

Is there a way of comparing it by using Pandas DataFrame? I have tried it myself, using the following code:

import pandas as pd
import matplotlib.pyplot as plt
import pickle

wb = pickle.load(open("data.p","rb"))

rain_period = wb[0]
flow_knudmose = wb[1]


periods = [['20170224','20170819','20170906'],
        ['20170308','20170826','20170917']]

# Period 1
rain_1 = rain_period.loc[(rain_period['Time'] >= periods[0][0]) &(rain_period['Time'] <= periods[1][0]) ]
rain_1.sort_values('Time',ascending=True,inplace=True)

water_1 = flow_knudmose.loc[(flow_knudmose['Time'] >= periods[0][0]) & (flow_knudmose['Time'] <= periods[1][0]) ]
water_1.sort_values('Time',ascending=True,inplace=True)

fig,axes = plt.subplots(nrows=2,ncols=1)
rain_1.plot(color='b',ax = axes[0], x='Time')
water_1.plot(color='r',ax = axes[1], x='Time')
plt.show()

This code made the figure I have attached. You can get the data.p pickle here

Thanks in advance!

aloevgaard
  • 27
  • 5

2 Answers2

0

So you have time data that does not match between the two tables and what you want is an "intersection" of the two time data sets. Discard time data from either set and create a new, common start and end times:

startTime = water_1.iloc[0]['Time'] if water_1.iloc[0]['Time'] >= rain_1.iloc[0]['Time'] else rain_1.iloc[0]['Time']
endTime   = water_1.iloc[-1]['Time'] if water_1.iloc[-1]['Time'] <= rain_1.iloc[-1]['Time'] else rain_1.iloc[-1]['Time']

Create a new dataset within these time limits:

rain_2 = rain_1[(rain_1['Time'] >= startTime) & (rain_1['Time'] <= endTime)]
water_2 = water_1[(water_1['Time'] >= startTime) & (water_1['Time'] <= endTime)]

Plot:

fig,axes = plt.subplots(nrows=2,ncols=1)
rain_2.plot(color='b',ax = axes[0], x='Time')
water_2.plot(color='r',ax = axes[1], x='Time')
plt.tight_layout()
plt.show()
Sachin Myneni
  • 273
  • 3
  • 14
0

I hope you find the following code and comments useful:

import pandas as pd
import matplotlib.pyplot as plt
import pickle

wb = pickle.load(open("data.pickle", "rb"))

rain_period = wb[0]
flow_knudmose = wb[1]

periods = [['20170224','20170819','20170906'],
        ['20170308','20170826','20170917']]

# <dataframe>.copy() are added to avoid a warning about modifying dataframe's view 
# As described at: https://stackoverflow.com/questions/17328655/pandas-set-datetimeindex,
# we can use DatetimeIndex for a new index; old 'Time' column can be dropped afterwards
rain_1 = rain_period.loc[(rain_period['Time'] >= periods[0][0]) & (rain_period['Time'] <= periods[1][0])].copy()
rain_1 = rain_1.set_index(pd.DatetimeIndex(rain_1['Time'])).drop(columns=["Time"]).sort_index()

water_1 = flow_knudmose.loc[(flow_knudmose['Time'] >= periods[0][0]) & (flow_knudmose['Time'] <= periods[1][0])].copy()
water_1 = water_1.set_index(pd.DatetimeIndex(water_1['Time'])).drop(columns=["Time"]).sort_index()

# With sharex=True, the plots show the entire period of time represented by the data in the dataframes,
# rather than the intersection of time periods (in the case with intersection, some important data might not be shown)
fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True)

# Without <index>.to_pydatetime(), this code produces an error:  
# "AttributeError: 'numpy.datetime64' object has no attribute 'toordinal'"
axes[0].plot_date(rain_1.index.to_pydatetime(), rain_1["Rain"], '-',
                 color='b', label="Rain");
axes[1].plot_date(water_1.index.to_pydatetime(), water_1["Water Level"], '-',
                  color='r', label="Water Level");

# Set the favorite angle for x-labels and show legends
for ax in axes:
    plt.sca(ax)
    plt.xticks(rotation=45)
    ax.legend(loc="upper right")

plt.show()

Output: produced plot

The conversion with to_pydatetime() was suggested at: Converting pandas DatetimeIndex to 'float days format' with Matplotlib.dates.datestr2num

This solution works for:

python 3.5.4 
pandas 0.21.0
matplotlib 2.1.0