0

I have a readings of transmission rate from two sides of uniderectional network link (side A and side Z) that are reported in form of a timestamp and value, that are aggregated and pulled on 1 minute interval. In ideal circumstances, if we neglect transmission delay, readings should be same on both sides of link (Out rate on A side == In rate on Z side) and I want to use them to detect if there is a loss of data in transmission. The problem is - readings arrive at different points of time, so readings from side Z lag by N seconds, what makes data pretty much useless as even without loss on the link Im getting readings from Z side at different point of time, when rate at side A already has changed

initial

Is there any interpolation algorithm that could help reconcile those signals in time?

I tried creating common index of two dataframes and using linear interpolation add datapoints to each of the frames. It gives better picture of alignment between graphs, but during rapid growth/slowdown distance between datapoints at same point of time is huge, for e.g: interpolated

Source data for the graphs in form of dictionaries:

df_a_side_out = {'output_bps': {Timestamp('2019-04-17 09:29:40-0700', tz='US/Pacific'): 35382522872.0, Timestamp('2019-04-17 09:30:41-0700', tz='US/Pacific'): 21079385419.6, Timestamp('2019-04-17 09:31:40-0700', tz='US/Pacific'): 31227610322.8, Timestamp('2019-04-17 09:32:40-0700', tz='US/Pacific'): 27822829221.333332, Timestamp('2019-04-17 09:33:40-0700', tz='US/Pacific'): 32904048834.8, Timestamp('2019-04-17 09:34:40-0700', tz='US/Pacific'): 25492801008.933334, Timestamp('2019-04-17 09:35:41-0700', tz='US/Pacific'): 35440406212.13333, Timestamp('2019-04-17 09:36:40-0700', tz='US/Pacific'): 25233478935.466667, Timestamp('2019-04-17 09:37:41-0700', tz='US/Pacific'): 40124788802.53333, Timestamp('2019-04-17 09:38:40-0700', tz='US/Pacific'): 22751043828.666668, Timestamp('2019-04-17 09:39:40-0700', tz='US/Pacific'): 34929660187.2, Timestamp('2019-04-17 09:40:41-0700', tz='US/Pacific'): 28188317863.733334, Timestamp('2019-04-17 09:41:41-0700', tz='US/Pacific'): 21337236735.866665, Timestamp('2019-04-17 09:42:40-0700', tz='US/Pacific'): 20949231319.333332, Timestamp('2019-04-17 09:43:41-0700', tz='US/Pacific'): 37289827508.933334, Timestamp('2019-04-17 09:44:40-0700', tz='US/Pacific'): 43531218338.53333, Timestamp('2019-04-17 09:45:41-0700', tz='US/Pacific'): 31844675965.333332, Timestamp('2019-04-17 09:46:40-0700', tz='US/Pacific'): 2393.3333333333335, Timestamp('2019-04-17 09:47:40-0700', tz='US/Pacific'): 6485669413.066667, Timestamp('2019-04-17 09:48:40-0700', tz='US/Pacific'): 27114641050.266666, Timestamp('2019-04-17 09:49:41-0700', tz='US/Pacific'): 30240896003.409836, Timestamp('2019-04-17 09:50:40-0700', tz='US/Pacific'): 47081233669.830505, Timestamp('2019-04-17 09:51:40-0700', tz='US/Pacific'): 45941505223.6, Timestamp('2019-04-17 09:52:40-0700', tz='US/Pacific'): 32794663316.133335, Timestamp('2019-04-17 09:53:41-0700', tz='US/Pacific'): 26202902204.666668, Timestamp('2019-04-17 09:54:40-0700', tz='US/Pacific'): 42744363073.46667, Timestamp('2019-04-17 09:55:40-0700', tz='US/Pacific'): 37591667043.6, Timestamp('2019-04-17 09:56:40-0700', tz='US/Pacific'): 11035404304.8, Timestamp('2019-04-17 09:57:40-0700', tz='US/Pacific'): 7707897097.466666, Timestamp('2019-04-17 09:58:40-0700', tz='US/Pacific'): 25327914733.066666, Timestamp('2019-04-17 09:59:40-0700', tz='US/Pacific'): 15763228742.8, Timestamp('2019-04-17 10:00:41-0700', tz='US/Pacific'): 30068024369.2, Timestamp('2019-04-17 10:01:40-0700', tz='US/Pacific'): 58940292672.26667, Timestamp('2019-04-17 10:02:41-0700', tz='US/Pacific'): 43484764068.26667, Timestamp('2019-04-17 10:03:41-0700', tz='US/Pacific'): 12948002074.266666, Timestamp('2019-04-17 10:04:41-0700', tz='US/Pacific'): 7776379160.655738, Timestamp('2019-04-17 10:05:40-0700', tz='US/Pacific'): 34174506576.81356, Timestamp('2019-04-17 10:06:40-0700', tz='US/Pacific'): 34642321006.933334, Timestamp('2019-04-17 10:07:40-0700', tz='US/Pacific'): 44025919118.13333, Timestamp('2019-04-17 10:08:41-0700', tz='US/Pacific'): 51441310396.8, Timestamp('2019-04-17 10:09:41-0700', tz='US/Pacific'): 49744733006.666664, Timestamp('2019-04-17 10:10:40-0700', tz='US/Pacific'): 39372041772.53333, Timestamp('2019-04-17 10:11:40-0700', tz='US/Pacific'): 37212362739.73333, Timestamp('2019-04-17 10:12:41-0700', tz='US/Pacific'): 29888187478.133335, Timestamp('2019-04-17 10:13:41-0700', tz='US/Pacific'): 23647225076.8, Timestamp('2019-04-17 10:14:41-0700', tz='US/Pacific'): 44232721589.333336, Timestamp('2019-04-17 10:15:40-0700', tz='US/Pacific'): 31619739302.8, Timestamp('2019-04-17 10:16:41-0700', tz='US/Pacific'): 34270903419.866665, Timestamp('2019-04-17 10:17:41-0700', tz='US/Pacific'): 37255143804.26667, Timestamp('2019-04-17 10:18:40-0700', tz='US/Pacific'): 29626685689.333332, Timestamp('2019-04-17 10:19:41-0700', tz='US/Pacific'): 37738576156.8, Timestamp('2019-04-17 10:20:41-0700', tz='US/Pacific'): 32520425703.733334, Timestamp('2019-04-17 10:21:40-0700', tz='US/Pacific'): 50682096771.066666, Timestamp('2019-04-17 10:22:40-0700', tz='US/Pacific'): 53442027636.0, Timestamp('2019-04-17 10:23:40-0700', tz='US/Pacific'): 48346635537.066666, Timestamp('2019-04-17 10:24:41-0700', tz='US/Pacific'): 28192208534.0, Timestamp('2019-04-17 10:25:41-0700', tz='US/Pacific'): 30508158848.533333, Timestamp('2019-04-17 10:26:40-0700', tz='US/Pacific'): 38669708961.73333, Timestamp('2019-04-17 10:27:41-0700', tz='US/Pacific'): 41905851091.333336, Timestamp('2019-04-17 10:28:40-0700', tz='US/Pacific'): 37885503188.4}}

df_z_side_in = {'input_bps': {Timestamp('2019-04-17 09:29:21-0700', tz='US/Pacific'): 32479665734.933334, Timestamp('2019-04-17 09:30:21-0700', tz='US/Pacific'): 28762393063.213116, Timestamp('2019-04-17 09:31:21-0700', tz='US/Pacific'): 24012409059.66102, Timestamp('2019-04-17 09:32:20-0700', tz='US/Pacific'): 30912397690.8, Timestamp('2019-04-17 09:33:21-0700', tz='US/Pacific'): 30150484213.508198, Timestamp('2019-04-17 09:34:21-0700', tz='US/Pacific'): 26572558234.666668, Timestamp('2019-04-17 09:35:20-0700', tz='US/Pacific'): 38830624164.47458, Timestamp('2019-04-17 09:36:20-0700', tz='US/Pacific'): 26512584207.866665, Timestamp('2019-04-17 09:37:20-0700', tz='US/Pacific'): 32343571104.133335, Timestamp('2019-04-17 09:38:21-0700', tz='US/Pacific'): 28372191073.704918, Timestamp('2019-04-17 09:39:20-0700', tz='US/Pacific'): 30009804008.677967, Timestamp('2019-04-17 09:40:20-0700', tz='US/Pacific'): 30764259885.2, Timestamp('2019-04-17 09:41:20-0700', tz='US/Pacific'): 27229582440.533333, Timestamp('2019-04-17 09:42:21-0700', tz='US/Pacific'): 12670550319.868853, Timestamp('2019-04-17 09:43:21-0700', tz='US/Pacific'): 38891533755.333336, Timestamp('2019-04-17 09:44:21-0700', tz='US/Pacific'): 46374133014.644066, Timestamp('2019-04-17 09:45:20-0700', tz='US/Pacific'): 40275148155.46667, Timestamp('2019-04-17 09:46:21-0700', tz='US/Pacific'): 2374.032786885246, Timestamp('2019-04-17 09:47:20-0700', tz='US/Pacific'): 3260927513.220339, Timestamp('2019-04-17 09:48:21-0700', tz='US/Pacific'): 19319788768.666668, Timestamp('2019-04-17 09:49:21-0700', tz='US/Pacific'): 29479921822.133335, Timestamp('2019-04-17 09:50:21-0700', tz='US/Pacific'): 42536464523.27869, Timestamp('2019-04-17 09:51:21-0700', tz='US/Pacific'): 48253007455.32204, Timestamp('2019-04-17 09:52:20-0700', tz='US/Pacific'): 28098055972.266666, Timestamp('2019-04-17 09:53:20-0700', tz='US/Pacific'): 34696013048.8, Timestamp('2019-04-17 09:54:21-0700', tz='US/Pacific'): 41089541187.540985, Timestamp('2019-04-17 09:55:20-0700', tz='US/Pacific'): 35818326833.355934, Timestamp('2019-04-17 09:56:21-0700', tz='US/Pacific'): 24461996828.0, Timestamp('2019-04-17 09:57:21-0700', tz='US/Pacific'): 2534090684.266667, Timestamp('2019-04-17 09:58:21-0700', tz='US/Pacific'): 22127687010.229507, Timestamp('2019-04-17 09:59:21-0700', tz='US/Pacific'): 23025967406.915253, Timestamp('2019-04-17 10:00:20-0700', tz='US/Pacific'): 10059074966.266666, Timestamp('2019-04-17 10:01:21-0700', tz='US/Pacific'): 67497142954.0, Timestamp('2019-04-17 10:02:21-0700', tz='US/Pacific'): 46389235268.0, Timestamp('2019-04-17 10:03:20-0700', tz='US/Pacific'): 21655645611.2, Timestamp('2019-04-17 10:04:21-0700', tz='US/Pacific'): 966253748.4, Timestamp('2019-04-17 10:05:20-0700', tz='US/Pacific'): 27733135839.866665, Timestamp('2019-04-17 10:06:21-0700', tz='US/Pacific'): 38420361510.55738, Timestamp('2019-04-17 10:07:20-0700', tz='US/Pacific'): 38791963200.27119, Timestamp('2019-04-17 10:08:21-0700', tz='US/Pacific'): 49337311755.333336, Timestamp('2019-04-17 10:09:21-0700', tz='US/Pacific'): 49036736751.2, Timestamp('2019-04-17 10:10:21-0700', tz='US/Pacific'): 40189220408.0, Timestamp('2019-04-17 10:11:20-0700', tz='US/Pacific'): 47269187739.333336, Timestamp('2019-04-17 10:12:21-0700', tz='US/Pacific'): 22747569814.666668, Timestamp('2019-04-17 10:13:20-0700', tz='US/Pacific'): 29592627519.066666, Timestamp('2019-04-17 10:14:21-0700', tz='US/Pacific'): 39522624640.78689, Timestamp('2019-04-17 10:15:20-0700', tz='US/Pacific'): 33426815865.627117, Timestamp('2019-04-17 10:16:20-0700', tz='US/Pacific'): 36818438483.86667, Timestamp('2019-04-17 10:17:21-0700', tz='US/Pacific'): 36014942532.327866, Timestamp('2019-04-17 10:18:21-0700', tz='US/Pacific'): 32190457857.333332, Timestamp('2019-04-17 10:19:20-0700', tz='US/Pacific'): 33696489212.067795, Timestamp('2019-04-17 10:20:20-0700', tz='US/Pacific'): 33386886955.333332, Timestamp('2019-04-17 10:21:20-0700', tz='US/Pacific'): 47954604950.13333, Timestamp('2019-04-17 10:22:21-0700', tz='US/Pacific'): 54281759713.57377, Timestamp('2019-04-17 10:23:20-0700', tz='US/Pacific'): 43724407654.37288, Timestamp('2019-04-17 10:24:20-0700', tz='US/Pacific'): 36995567964.666664, Timestamp('2019-04-17 10:25:21-0700', tz='US/Pacific'): 25491555548.590164, Timestamp('2019-04-17 10:26:21-0700', tz='US/Pacific'): 38326723270.26667, Timestamp('2019-04-17 10:27:20-0700', tz='US/Pacific'): 43034165564.61017, Timestamp('2019-04-17 10:28:20-0700', tz='US/Pacific'): 37405127893.6}}
mongolio
  • 33
  • 5
  • 1
    Can you add some example data you used. Mostly 5-10 rows is enough, you can add some more if you think that clears your question up. You can find information about creating a good pandas question [here](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). In your question its crucial to see some data to answer your question, hard to say just by looking at your plots – Erfan Apr 17 '19 at 23:06
  • My bad. Just added full datapoints set for both graphs. – mongolio Apr 17 '19 at 23:10

1 Answers1

1

Method 1 to align plot

We can do the following to get the data exactly aligned, but I'm not sure what kind of data this is and if it actually makes sense to solve it this way. But maybe this helps.


  1. First we concat the dataframes side by side.

  2. Then we fillna the rows by the the opposite dataframes data.

df = pd.concat([df_a, df_b], axis=1)
df['output_bps'].fillna(df_d['input_bps'], inplace=True)
df['input_bps'].fillna(df_d['output_bps'], inplace=True)

Then we plot again and we see that it's perfectly aligned. As we see in the legend, its actually two lines

fig = plt.figure(figsize=(16,10))

plt.plot(df['output_bps'], label='Side A out')
plt.plot(df['input_bps'], label='Side Z in')
plt.legend(loc='upper left')
plt.show()

plot


Method 2 to find more accurate differences

So as if I understand correctly. It's going to be hard to find accurate differences (loss) because of different timestamps which the sensors recorded the data.

We can work our data to make it more accurate. Not just interpolate. But resample our data to 1 second index and then interpolate to get more accuracy. After that we take the differences on the same timestamps to find the differences.

This is the closest I can get:

# reindex to make a new dataframe
df_z = pd.DataFrame(index=pd.date_range(start=df.index.min(), 
                                        end=df.index.max(), 
                                        freq='1S'), columns=df.columns)

# merge the values of original dataframe and remove columns we dont need
df_z = df_z.merge(df, 
                  left_index=True, 
                  right_index=True, 
                  how='left', 
                  suffixes=['_1', '']).filter(regex='(^[^0-9]+$)')

# fill NaN by linear interpolation
for col in df_z.columns:
    df_z[col] = df_z[col].interpolate(method='linear', limit_direction='both', )

# Calculate the loss on each second
df_z['loss'] = df_z['output_bps'] - df_z['input_bps']

Now we can plot our data again including the loss

fig = plt.figure(figsize=(16,10))

plt.plot(df_z['output_bps'], label='Side A out')
plt.plot(df_z['input_bps'], label='Side Z in')
plt.plot(df_z['loss'], label='Loss')
plt.legend(loc='upper left')

plt.show()

plot2

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Erfan
  • 40,971
  • 8
  • 66
  • 78
  • I think this way we will produce more accurate representation of traffic flowing through the link, if there is no loss on it. But it doesn't help with the goal of comparing input and output to detect loss. – mongolio Apr 17 '19 at 23:54
  • Could you describe better what "loss of data" is in this case? How do you calculate that. Maybe do that for one timestamp so I can understand @mongolio – Erfan Apr 17 '19 at 23:56
  • Let's use analogy of water pipe, if we have a leak on the pipe - rate at which water entering pipe would be different from the rate at which water flowing out other side of pipe - because of leak. Comparing readings on both side we should be able to see that we are having the leak. In my case water flowing with changing pressure and readings on A side and Z side of pipe are happening at different points of time. – mongolio Apr 18 '19 at 00:00
  • 1
    I see, that makes it more clear. I tried to reindex and get more accurate calculation of loss. See my edited answer. @mongolio – Erfan Apr 18 '19 at 00:41
  • Erfan, thank you for looking into it! The thing is that there is no loss behind this graph, so I was thinking about some advanced algorithms that can be applied to interpolate points on it, linear as you see is not really working. I tried cubic and spline, results looks promising but not good enough. – mongolio Apr 18 '19 at 18:19