2

I have a function f that needs to be applied to a single column of length n in segments of m length, where m divides n. (For example, to a column of 1000 values, apply f to the first 250 values, then to 250-500, ...).

A loop is overkill, since the column has over 16 million values. I was thinking the efficient way would be to separate the column of length n into q vectors of length m, where mq = n. Then I could apply f simultaneously to all this vectors using some lapply-like functionality. Then I cold join the q vectors to obtain the transformed version of the column.

Is that the efficient way to go here? If so, what function could decompose a column into q vectors of equal length and what function should I use to broadcast f across the q vectors?

Lastly, although less importantly, what if we wanted to do this to several columns and not just one?

Context

I've programmed a function that computes the power spectrum of an EEG signal (a numeric vector). However, it is bad practice to compute the power spectrum of a whole signal at once. The correct method is to compute it epoch by epoch, in 30 or 5 second segments, and average the spectrum of all those epochs. Hence why I need to apply a function to a column (an EEG signal) by epochs (or segments).

lafinur
  • 261
  • 1
  • 8
  • 1
    Can you provide an example dataset (a small one, not the 16 million value dataset, similar to what you're working with) and any code you've tried so far for the function `f`? – jrcalabrese Dec 12 '22 at 01:40
  • Why do you need to do this in segments? Are you doing some type of aggregation? Or is it to avoid having too much data in memory at once? A [reproducible example](https://stackoverflow.com/q/5963269/5325862) is really necessary here – camille Dec 12 '22 at 04:14
  • I'm in charge of programming a scientific R package with general EEG functions at a neuroscience lab. I've programmed a function that computes the power spectrum of an EEG signal (a numeric vector). However, it is bad practice to compute the power spectrum of a whole signal at once. The correct method is to compute it epoch by epoch, in 30 or 5 second segments, and average the spectrum of all those epochs. Hence why I need to apply a function to a column (an EEG signal) by epochs (or segments). – lafinur Dec 12 '22 at 16:07
  • I will add a reproducible example tonight – lafinur Dec 12 '22 at 16:08

1 Answers1

1

A way to do it is to create an auxiliar variable, so you can apply to each variable, depending on your function you can use group_by and/or summarize, an example:

df <- data.frame(
  x = rnorm(15),
  y = rnorm(15),
  z = rnorm(15)
)

library(dplyr)

df %>% 
  mutate(
    aux = rep(1:3,each = (nrow(df)/3)),
    across(.cols = c(x,y,z),.fns = ~ . + 2 * aux)
    ) 

          x        y        z aux
1  2.164841 2.882465 2.139098   1
2  2.364115 2.205598 2.410275   1
3  2.552158 1.383564 1.441543   1
4  1.398107 1.265201 2.605371   1
5  1.006301 1.868197 1.493666   1
6  5.026785 4.310017 2.579434   2
7  4.751061 2.960320 4.127993   2
8  2.490833 3.815691 5.945851   2
9  3.904853 4.967267 4.800914   2
10 3.104052 3.891720 5.165253   2
11 3.929249 5.301579 6.358856   3
12 6.150120 5.724055 5.391443   3
13 5.920788 7.114649 5.797759   3
14 5.902631 6.550044 5.726752   3
15 6.216153 7.236676 5.531300   3
Vinícius Félix
  • 8,448
  • 6
  • 16
  • 32