0

I have a signal and I want to remove a polynomial trend. So far I have found this solution:

fit <- polyfit(1:length(signal),signal,trend)
fit <- polyval(fit,1:length(signal))
detrended_signal <- signal - fit

It works as expected but it is very slow. An alternative seems to be:

detrended_signal <- gsignal::detrend(signal,p=trend)

This is much faster but when I use long signals (>350 datapoints) and trend=2 the detrended signal I get is basically a parabola....

Is there a faster alternative to the solution I found?

Edit:

i) https://filebin.net/dotzzobvt2uvwcjr here is an example of the data and the functions I used

ii) https://ibb.co/3mBLw80 here is an example of the signals. Black: original, red: using polyfit, blue: gsignal::detrend

Edit 2:

The problem appears when I reach 350 datapoints, until 349 everything seems fine.

Script to run:

rm(list=ls())
signal <- c(4253.84615384615, 4258.46153846154, 4265.12820512821, 4270.76923076923, 
4272.82051282051, 4270.25641025641, 4263.58974358974, 4255.38461538462, 
4247.69230769231, 4240.51282051282, 4235.38461538462, 4232.30769230769, 
4231.28205128205, 4231.28205128205, 4231.28205128205, 4230.76923076923, 
4230.25641025641, 4230.25641025641, 4231.28205128205, 4232.30769230769, 
4232.82051282051, 4231.79487179487, 4229.74358974359, 4227.17948717949, 
4226.66666666667, 4229.74358974359, 4235.38461538462, 4241.53846153846, 
4246.15384615385, 4247.17948717949, 4245.64102564103, 4242.05128205128, 
4237.4358974359, 4233.84615384615, 4233.84615384615, 4237.4358974359, 
4243.07692307692, 4246.66666666667, 4246.66666666667, 4242.5641025641, 
4237.94871794872, 4235.38461538462, 4237.4358974359, 4241.53846153846, 
4245.12820512821, 4246.15384615385, 4243.58974358974, 4239.48717948718, 
4234.35897435897, 4229.23076923077, 4224.10256410256, 4221.02564102564, 
4221.02564102564, 4225.12820512821, 4231.28205128205, 4236.92307692308, 
4240, 4240.51282051282, 4238.97435897436, 4237.4358974359, 4236.92307692308, 
4238.46153846154, 4241.02564102564, 4244.61538461538, 4247.17948717949, 
4247.69230769231, 4244.61538461538, 4238.97435897436, 4232.82051282051, 
4228.71794871795, 4228.71794871795, 4232.30769230769, 4236.92307692308, 
4239.48717948718, 4238.46153846154, 4235.89743589744, 4232.30769230769, 
4229.74358974359, 4229.74358974359, 4231.79487179487, 4235.89743589744, 
4240, 4242.5641025641, 4243.07692307692, 4241.02564102564, 4238.97435897436, 
4237.94871794872, 4240.51282051282, 4245.12820512821, 4250.25641025641, 
4253.84615384615, 4255.38461538462, 4255.38461538462, 4253.33333333333, 
4250.25641025641, 4245.64102564103, 4241.02564102564, 4237.4358974359, 
4235.89743589744, 4235.89743589744, 4236.92307692308, 4237.94871794872, 
4237.94871794872, 4237.4358974359, 4236.41025641026, 4235.38461538462, 
4234.87179487179, 4235.89743589744, 4238.46153846154, 4242.05128205128, 
4245.12820512821, 4247.17948717949, 4247.17948717949, 4245.64102564103, 
4244.61538461538, 4244.61538461538, 4245.64102564103, 4245.12820512821, 
4242.05128205128, 4238.46153846154, 4235.89743589744, 4235.38461538462, 
4235.89743589744, 4235.38461538462, 4233.33333333333, 4231.28205128205, 
4231.28205128205, 4233.33333333333, 4238.97435897436, 4245.64102564103, 
4251.28205128205, 4252.82051282051, 4251.28205128205, 4248.71794871795, 
4247.69230769231, 4250.25641025641, 4254.87179487179, 4257.4358974359, 
4255.38461538462, 4248.71794871795, 4240.51282051282, 4234.87179487179, 
4233.84615384615, 4237.4358974359, 4241.02564102564, 4242.05128205128, 
4240, 4236.41025641026, 4233.33333333333, 4231.79487179487, 4231.28205128205, 
4231.79487179487, 4233.33333333333, 4236.41025641026, 4240.51282051282, 
4244.61538461538, 4248.71794871795, 4251.28205128205, 4251.79487179487, 
4248.71794871795, 4244.10256410256, 4238.46153846154, 4234.87179487179, 
4235.38461538462, 4238.97435897436, 4244.10256410256, 4246.66666666667, 
4245.12820512821, 4241.02564102564, 4237.4358974359, 4237.94871794872, 
4243.58974358974, 4251.28205128205, 4256.41025641026, 4257.4358974359, 
4253.84615384615, 4250.25641025641, 4249.74358974359, 4250.76923076923, 
4251.28205128205, 4248.71794871795, 4243.07692307692, 4237.4358974359, 
4233.84615384615, 4235.38461538462, 4241.02564102564, 4247.17948717949, 
4251.28205128205, 4251.79487179487, 4248.71794871795, 4245.12820512821, 
4242.05128205128, 4240, 4237.94871794872, 4235.89743589744, 4233.33333333333, 
4231.79487179487, 4231.28205128205, 4232.30769230769, 4233.33333333333, 
4234.35897435897, 4231.28205128205, 4223.58974358974, 4212.82051282051, 
4203.58974358974, 4201.02564102564, 4205.12820512821, 4212.30769230769, 
4216.41025641026, 4213.84615384615, 4206.15384615385, 4197.4358974359, 
4191.79487179487, 4190.76923076923, 4193.84615384615, 4198.46153846154, 
4203.58974358974, 4209.23076923077, 4214.35897435897, 4218.46153846154, 
4221.02564102564, 4220.51282051282, 4217.94871794872, 4213.33333333333, 
4208.71794871795, 4204.10256410256, 4201.02564102564, 4200, 4201.02564102564, 
4202.5641025641, 4203.58974358974, 4203.07692307692, 4201.02564102564, 
4198.97435897436, 4198.97435897436, 4201.53846153846, 4206.66666666667, 
4212.30769230769, 4215.38461538462, 4215.89743589744, 4215.89743589744, 
4218.46153846154, 4224.61538461538, 4231.79487179487, 4236.41025641026, 
4235.89743589744, 4229.74358974359, 4221.02564102564, 4213.33333333333, 
4210.76923076923, 4214.87179487179, 4221.53846153846, 4226.66666666667, 
4226.66666666667, 4221.02564102564, 4213.33333333333, 4206.15384615385, 
4202.5641025641, 4202.05128205128, 4203.58974358974, 4206.66666666667, 
4208.20512820513, 4208.71794871795, 4207.69230769231, 4206.66666666667, 
4205.12820512821, 4203.07692307692, 4201.53846153846, 4200.51282051282, 
4200, 4199.48717948718, 4199.48717948718, 4200.51282051282, 4201.53846153846, 
4201.53846153846, 4200, 4196.41025641026, 4190.76923076923, 4186.15384615385, 
4184.10256410256, 4186.66666666667, 4192.30769230769, 4198.97435897436, 
4202.05128205128, 4200.51282051282, 4194.35897435897, 4187.17948717949, 
4183.07692307692, 4184.10256410256, 4189.74358974359, 4196.92307692308, 
4202.5641025641, 4205.12820512821, 4204.10256410256, 4201.53846153846, 
4198.46153846154, 4195.89743589744, 4195.38461538462, 4196.92307692308, 
4201.02564102564, 4204.61538461538, 4205.64102564103, 4204.61538461538, 
4202.5641025641, 4201.53846153846, 4201.53846153846, 4200, 4195.38461538462, 
4188.71794871795, 4183.58974358974, 4182.05128205128, 4185.12820512821, 
4190.25641025641, 4195.38461538462, 4198.46153846154, 4200, 4201.53846153846, 
4203.58974358974, 4204.10256410256, 4203.07692307692, 4200, 4198.97435897436, 
4201.53846153846, 4205.64102564103, 4208.71794871795, 4208.20512820513, 
4206.66666666667, 4205.64102564103, 4207.17948717949, 4209.23076923077, 
4208.71794871795, 4203.58974358974, 4193.84615384615, 4184.10256410256, 
4178.46153846154, 4178.46153846154, 4183.07692307692, 4188.20512820513, 
4191.79487179487, 4193.84615384615, 4196.41025641026, 4200, 4203.07692307692, 
4203.58974358974, 4201.02564102564, 4197.4358974359, 4197.4358974359, 
4202.5641025641, 4211.28205128205, 4217.94871794872)
trend = 2
library(pracma)

fit <- polyfit(1:length(signal),signal,trend)
estimate <- polyval(fit,1:length(signal))
detrended_signal_v1 <- signal - estimate




detrended_signal_v2 <- gsignal::detrend(signal,trend)




minimum <- min(c(signal,detrended_signal_v1,detrended_signal_v2))
maximum <- max(c(signal,detrended_signal_v1,detrended_signal_v2))


plot(1:length(signal),signal,'l',ylim = c(minimum,maximum))
lines(1:length(detrended_signal_v1),detrended_signal_v1,col = 'red')
lines(1:length(detrended_signal_v2),detrended_signal_v2,col = 'blue')
Orestis
  • 53
  • 5
  • 1
    Please share reproducible sample data so we have something to work with. – Maurits Evers Jul 19 '22 at 09:44
  • There is some crucial information missing here. For example, using sample data from `?pracma::polyfit`, we can do `x <- seq(0, pi, length.out = 1000); y <- sin(x); fit <- polyfit(x, y, 6); y_pred <- polyval(p, x); detrended_signal <- y - y_pred` the runtime of which is negligible. So I am surprised that in your case "it is very slow". Fitting a polynomial model of degree 6 to 1000 data points takes milliseconds so isn't exactly slow. What are we missing? – Maurits Evers Jul 19 '22 at 09:58
  • Hi, I uploaded some data so you can have a look. It is true that a single polyfit does not take too much time but I have to do it multiple times. For example, using polyfit takes my script 2 min to run. Using gsignal::detrend, it takes me 45 seconds. I have to run this script for a large number of signals, so it adds up. – Orestis Jul 19 '22 at 10:17
  • Sorry but I don't download files from short-term file hosters. Please include sample data as text (or code) within your post. That is the [usual method here on SO](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Maurits Evers Jul 19 '22 at 10:24
  • 1
    Ok, I pasted the code and data as text – Orestis Jul 19 '22 at 10:33

1 Answers1

0

I cannot confirm your observations so I am assuming we are missing something that you haven't included in your post.

Let's benchmark both methods.

library(microbenchmark)
library(pracma)
library(gsignal)

detrend_pracma <- function(y, trend) {
    fit <- polyfit(seq_along(y), y, trend)
    y_pred <- polyval(fit, seq_along(y))
    y - y_pred
}
detrend_gsignal <- function(y, trend) {
    gsignal::detrend(y, trend)
}

res <- microbenchmark(
    detrend_pracma = detrend_pracma(signal, trend = 2),
    detrend_gsignal = detrend_gsignal(signal, trend = 2),
    times = 1000L
)
autoplot(res)

enter image description here

Running the code 1000 times takes less than a second for both methods. While gsignal::detrend seems marginally faster, I certainly don't see the factor 3 increase that you observe.

When I run detrend_v1 10000 times it takes just under 3 secs on my 10 year old MacBook Air.

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • The detrending is just part of a for loop so it adds up. For example, after running both versions using `profvis` the polyfit version took 870 ms, the detrend version too 220 ms. – Orestis Jul 19 '22 at 12:11
  • 1
    The 870 ms vs 220 ms are still nowhere near the 2 min vs 45 s you mentioned earlier. The issue here is that this is *not* an issue with the de-trending. So this is a bit of an XY problem. A solution to your problem would be to optimise the downstream analysis. I don't know what your goal is downstream but the de-trending is not the bottleneck. – Maurits Evers Jul 19 '22 at 12:44
  • Well, I still prefer to use detrend because it makes the code cleaner. I don't really know why after 350 datapoints it doesn't work.... – Orestis Jul 19 '22 at 14:07
  • 1
    @Orestis I think we're misunderstanding each other. I'm not saying "don't use detrending"; I'm saying detrending (no matter which method you use) is not the bottleneck. – Maurits Evers Jul 20 '22 at 07:57