0

I have a long df with date values every 5 seconds and CO2 concentration values from a continuous monitoring. Those values need to be calibrated with data provided in a second df that store initial and final dates, and the calibration parameter that I need to use.

Calibration parameters change with time. The steps I need to perform are:

  1. Split df1 in several df as rows in df 2 according initial date and final date,
  2. Apply the calibration parameter suplied in each row of df2,
  3. Rebuild the initial df to store calibrated data.

I'm strugling trying to split df1 acording to df2 in R, I tried a for bucle that did not work and I'm convinced I need to use a more straightforward approach like split() or apply().

map of the actions I need to do

Since the data is too big I give minimal example, it looks like:

DateTime CO2
14-05-2022 00:19:50 479.8340879
14-05-2022 00:19:55 479.836915
14-05-2022 00:20:00 479.8462298
14-05-2022 00:20:05 479.8417516
14-05-2022 00:20:10 479.823782
14-05-2022 00:20:15 479.8069912
14-05-2022 00:20:20 479.7700943
14-05-2022 00:20:25 479.7807222
14-05-2022 00:20:30 479.7696609
14-05-2022 00:20:35 479.7580641
14-05-2022 00:20:40 479.7799673
14-05-2022 00:20:45 479.8502333
14-05-2022 00:20:50 479.9433364
14-05-2022 00:20:55 480.0223177
14-05-2022 00:21:00 480.115519
14-05-2022 00:21:05 480.1925293
14-05-2022 00:21:10 480.2117073
14-05-2022 00:21:15 480.3010663
14-05-2022 00:21:20 480.3629772
14-05-2022 00:21:25 480.464677
14-05-2022 00:21:30 480.5220228
14-05-2022 00:21:35 480.5644807
14-05-2022 00:21:40 480.6019965
14-05-2022 00:21:45 480.6793977
14-05-2022 00:21:50 480.7235118
14-05-2022 00:21:55 480.7624506
14-05-2022 00:22:00 480.7887041
14-05-2022 00:22:05 480.7656519
14-05-2022 00:22:10 480.7710211
14-05-2022 00:22:15 480.7655103
14-05-2022 00:22:20 480.7906543
14-05-2022 00:22:25 480.7992506
14-05-2022 00:22:30 480.7758722

And the calibration df2 could be

date_initial date_final calib_parameter
14-05-2022 00:00:00 14-05-2022 00:20:59 0.98
14-05-2022 00:21:00 14-05-2022 00:21:59 0.99
14-05-2022 00:22:00 14-05-2022 00:22:59 0.97

and I need to multiply by diferent calib parameters values in df1 that are between date_initial and date_final in df2

Donald Seinen
  • 4,179
  • 5
  • 15
  • 40
Carme
  • 1
  • 2
  • Hi Carme, and welcome to SO! If you can include a reproducible example: https://stackoverflow.com/help/minimal-reproducible-example, it will be easier to see where you're at now and help you move forward. – Michael Roswell Aug 02 '22 at 11:17
  • I'm not sure I understand the 2nd data frame: is it the case that each chunk of time has exactly 1 calibration parameter associated with it? – Michael Roswell Aug 02 '22 at 11:19
  • Does this answer your question? [How to create new variable based on time and preexisting variables?](https://stackoverflow.com/questions/57485767/how-to-create-new-variable-based-on-time-and-preexisting-variables) – Michael Roswell Aug 02 '22 at 11:21
  • Hi Michael, thanks a lot for your answer, i included some data from my df so you can visualize the data. The calibration df is more complex than a simple multiplication by a parameter but it would be much complex to expose the whole problem. Right now it's my difficulty selecting different parts of one df based on another df that's keeping me from moving forward. – Carme Aug 02 '22 at 11:53
  • Hi Carme, thanks for the example data! if you can add the data with `dput` that would be even better... It sounds like something like this might help: https://stackoverflow.com/a/51283920/8400969 – Michael Roswell Aug 02 '22 at 13:05
  • Thanks Michael, it could work for my question but the true is that I need to be able to store and analize separated df, otherwise the computational time of analizing row by row is to long, and also I need to apply a calibration curve that I skiped in the example for simplification. I really need to break it in df and operate them separately. – Carme Aug 03 '22 at 06:16

1 Answers1

0

You may want to look at the {fuzzyjoin} package, which lets you join datasets by date.

The strategy is:

  1. Join df2 to df1 wherever df1$DateTime is between df2$date_initial and df2$date_final.
  2. Multiply CO2 by the calibration factor.
  3. If necessary, delete any extra columns.
library(fuzzyjoin)
df1 <- fuzzy_inner_join(df1, df2,
         by = c("DateTime" =  "date_initial",
                  "DateTime" = "date_final"), 
         match_fun = list(`>`, `<`)
       )
df1$CO2 <- df1$CO2 * df1$calib_parameter
df1$calib_parameter <- NULL

This is basically this answer, but with a small correction. You might also find some of these answers helpful.

dash2
  • 2,024
  • 6
  • 15