0

I am a beginner to R and I am looking for help with a function/loop.

I Have this data table "by_newborn":

id_mom date id_newborn week weight conception_date pregnancy_interval one_year_before_pregnany one_year_before_interval first_trimester
1.21e+12 01/05/2020 1234 18 2 2019-12-27 2019-12-27 UTC--2020-05-01 UTC 2018-12-26 2018-12-26 UTC--2019-12-27 UTC 2020-04-02
1.21e+12 01/05/2020 5489 18 2 2019-12-27 2019-12-27 UTC--2020-05-01 UTC 2018-12-26 2018-12-26 UTC--2019-12-27 UTC 2020-04-02

by_newborn structure:

structure(list(ן..ID = c(2602035392, 2602035392, 4104232942), 
    date_of_birth = structure(c(1L, 1L, 2L), .Label = c("01/05/2020", 
    "02/05/2018", "03/05/2020", "04/05/2020", "05/05/2020", "06/05/2020", 
    "07/05/2020", "08/05/2020"), class = "factor"), week = c(38L, 
    38L, 36L), conception_date = structure(c(18117, 18117, 17401
    ), class = "Date"), pregnancy_interval = new("Interval", 
        .Data = c(22982400, 22982400, 21772800), start = structure(c(1565308800, 
        1565308800, 1503446400, 1503446400, 1563062400, 1563062400, 
        1564358400, 1564358400, 1563840000, 1563840000, 1563926400, 
        1564617600, 1567728000), tzone = "UTC", class = c("POSIXct", 
        "POSIXt")), tzone = "UTC")), row.names = c(NA, 
-3L), class = c("data.table", "data.frame"))

I have created the intervals using Data table and Lubridate

conception_date = lubridate :: dmy(by_newborn$date) - lubridate:: weeks(by_newborn$week)
by_newborn[, conception_date:= conception_date]
by_newborn[, pregnancy_interval := interval(ymd(by_newborn$conception_date), dmy(by_newborn$date))]

I have a second table I made TSH_results with tests results history for each id_mom:

id_mom  |date      |tsh_level|Units
1.21e+12|01/02/2020|0.5      |ng/dl
1.21e+12|05/02/2020|0.5      |ng/dl
1.21e+12|03/05/2015|1.8      |ng/dl
1.21e+12|09/05/2015|1.8      |ng/dl

TSH_results structure:

structure(list(ן..id_mom = c(1.21e+12, 1.21e+12, 1.21e+12, 1.21e+12, 
1.21e+12), date = c("01/02/2020", "01/02/2020", "01/02/2020", 
"01/02/2020", "01/02/2020"), TSH_level = c("0.5", "0.5", "0.5", 
"0.5", "0.5"), measur = c("ng/dl", "ng/dl", "ng/dl", "ng/dl", 
"ng/dl")), row.names = c(NA, -5L), class = c("data.table", "data.frame"
),

I would like for some help with writing a code that will look for each ID for a result in TSH results that is within an interval (or 2 dates) and will return the TSH level to a new column in by_newborn

I have tried this, but it seems I might need a loop or another way:

by_newborn[id_mom == TSH_results$id_mom & (dmy(TSH_results$date) %within%
pregnancy_interval), preg_results := TSH_results$result]

Many thanks in advance!

nitz
  • 1
  • 1
  • 1
    your sample data is rubbish (please use `dput(mydata)` to provide some), so I cannot produce an answer. The best I can do: I take a look al the `foverlaps()`-function from the data.table package, and start from there. Or use a `non-equi`-join – Wimpel Apr 14 '22 at 14:32
  • Your second table has strings where it needs proper`Date`s. Since your question is not about converting strings to dates/timestamps, I suggest you pre-clean your data. I second the notion that this is a non-equi join, your `id_mom == TSH_results$id_mom` is wrong. In general, it might be helpful if you first understand the concept of join/merge (based on strict equality of fields), I suggest you read https://stackoverflow.com/q/1299871/3358272, https://stackoverflow.com/q/5706437/3358272. After that, adapting your mindset to non-equi join takes a bit of elbow-grease, but will be less difficult. – r2evans Apr 14 '22 at 14:46
  • @Wimpel Thank you, much appreciated! what is the preferred way to upload a data.table/frame here? – nitz Apr 14 '22 at 14:51
  • yes.. use the output generated by `dput(mydata)`.. when too large, select relevant rows/columns (like `dput(mydata[1:100, 2:4])`to reproduce the problem you want solved. – Wimpel Apr 14 '22 at 14:54
  • You're learning `dput` :-) ... the `.internal.selfref` is not usable, I suggest you remove it and wrap the `structure(.)` in setDT or as.data.table, resulting in `as.data.table(structure(...))`. Second, even if I load `lubridate`, I get the error `invalid class "Interval" object: Inconsistent lengths: spans = 3, start dates = 13`, can you please verify that pasting the `structure(.)` in your console works? If so, then it's something wrong on my end, but it would be useful to have this question be reproducible in that way. Thanks! – r2evans Apr 15 '22 at 13:36

0 Answers0