8

I have quite a few sensors in the field that measure water pressure. In the past the height of these sensors have been changed quite a few times creating jumps in the timeseries. Since these timeseries are continuous and I have a manual measurement I should technically be able to remove the jumps (by hand this is easy, but there are too many measurements so I need to do it in python).

I've tried removing the jumps using a median filter but this doesn't really work.

My code:

    # filter out noise in signal (peaks)
    minimumPeak = 0.03 # filter peaks larger than 0.03m
    filtered_value = np.array(im.median_filter(data['value'], 5))
    noise = np.array((filtered_value-data['value']).abs() > minimumPeak)
    data.loc[noise, 'value'] = filtered_value[noise]

data is pandas dataframe containing two columns: 'datetime' and 'value'.

I've also tried to do this manually and got it working in a simple case, but not well in any other. Any idea how I would filter out the jumps?

An example is shown in the picture below (yellow indicating the jumps, red the measurement by hand (it is very well possible that this measurement is not in the beginning as it is in this example))

Time series with jumps

NoDataDumpNoContribution
  • 10,591
  • 9
  • 64
  • 104
Yorian
  • 2,002
  • 5
  • 34
  • 60
  • It seems that your jumps are related to very high variations of values. Why not checking if the module of the difference between two consecutive values overcome a certain threshold? – aretor Jan 25 '17 at 17:36
  • 3
    You have two different types of jumps: sharp peaks and steps. I don't understand what you want to do about the steps, for example the last highlighted jump. What should the data look like after the steps are removed? – Michael Jan 25 '17 at 18:18
  • Do you want to remove the steps by offsetting the data, or do you expect to filter them in some way? – Stephen Rauch Jan 25 '17 at 20:37
  • @Michael When the OP described his time-series data as "continuous" when explaining why he'd like to "remove the jumps," I think it's fair to say that the OP wants an interpolation of values during the jump. In other words, the OP wants those jumps to be replaced with the values that would reasonably be presenting given the function's behavior to the left of the jump's starting point and the right of the jump's ending point. – Vladislav Martin Jan 25 '17 at 20:43
  • Thanks for the response. Both AreTor and Vladislaw Martin are correct. The data should be continuous, to solve this I looked for the derivative to be higher than a certain value (dh/dt > maxValue). At these location I subtract this dh (the peak/jump) from the location and all the values following it. I do this peak by peak towards the ends which should result in a smooth line again. I had this working for a simple case but couldn't get it to work for a more difficult ones somehow. – Yorian Jan 26 '17 at 07:26
  • "I had this working for a simple case but couldn't get it to work for a more difficult ones somehow." It definitely should work unless your more difficult case is much more difficult than the data displayed here. You should then display the difficult case. – NoDataDumpNoContribution Jan 26 '17 at 12:42

2 Answers2

7

You have sharp peaks and steps in your data. I guess you want to

  • remove the peaks and replace by some averaged values
  • remove the steps by cumulative changing the offset of the remaining data values

That's in line with what you said in your last comment. Please note, that this will alter (shift) big parts of your data!

It's important to recognize that the width of both, peaks and steps, is one pixel in your data. Also you can handle both effects pretty much independently.

I suggest to first remove the peaks, then remove the steps.

  1. Remove peaks by calculating the absolute difference to the previous and to the next data value, then take the minimum of both, i.e. if your data series is y(i) compute p(i)=min(abs(y(i)-y(i-1)), abs(y(i+1)-y(i))). All values above a threshold are peaks. Take them and replace the data values with the mean of the previous and the next pixel like.

  2. Now remove the steps, again by looking for absolute differences of consecutive values (as suggested in the comment by AreTor), s(i)=abs(y(i)-y(i-1)) and look for values above a certain threshold. The positions are the step positions. Create an zero-valued offset array of the same size, then insert the differences of the data points (without the absolute value), then form the cumulative sum and subtract the result from the original data to remove the steps.

Please note that this removes peaks and steps which go up as well as down. If you want to remove only one kind, just don't take the absolute value.

Community
  • 1
  • 1
NoDataDumpNoContribution
  • 10,591
  • 9
  • 64
  • 104
  • 2
    for 2. Good suggestion, worked for me. I found out that numpy.unwrap seems to do the same thing, might be faster to that function instead of custom implementation. You can specify a custom threshold. (discont argument) [unwrap](https://numpy.org/doc/stable/reference/generated/numpy.unwrap.html) – D.Thomas Aug 05 '21 at 14:07
0

You can try it like this:

import numpy as np
import matplotlib.pyplot as plt
import h5py
%matplotlib inline

# I'm not sure that you need all of this packedges

filepath = 'measurment.hdf5'

with h5py.File(filepath, 'r') as hdf:
    data_y = hdf['y'][:]
    data_x = hdf['x'][:]

data = data_y

delta_max = 1 # maximum difference in y between two points
delta = 0 # running correction value
data_cor = [] # corrected array
data_cor.append(data[0:1]) # we append two first points

for i in range(len(data_x)-2): # two first points are allready appended
    i += 2
    delta_i = data[i] - data[i-1]
    if np.abs(delta_i) > delta_max:
        delta += (delta_i - (data_cor[i-1] - data_cor[i-2]))
        data_cor.append(data[i]-delta)
    else:
        data_cor.append(data[i]-delta)

plt.plot(data_x, data_cor)
T.J
  • 1