0

I am currently working in R on a data set that looks somewhat like the following (except it holds millions of rows and more variables) :

pid     agedays    wtkg    htcm    bmi     haz    waz    whz 
1       2          1.92    44.2    9.74    -2.72  -3.23  NA             
1       29         2.68    49.2    11.07   -2.21  -3.03  -2.00                
1       61         3.63    52.0    13.42   -2.49  -2.62  -0.48        
1       89         4.11    55.0    13.59   -2.20  -2.70  -1.14
2       1          2.40    48.1    10.37   -0.65  -1.88  -2.54          
2       28         3.78    53.1    13.41   -0.14  -0.58  -0.79
2       56         4.53    55.2    14.87   -0.68  -0.74  -0.18                 
2       104        5.82    61.3    15.49    0.23  -0.38  -0.70 

I am working to create a function, in which the following variables are added : haz_1.5, waz_1.5, whz_1.5, htcm_1.5, wtkg_1.5, and bmi_1.5

each variable will follow the same pattern of criteria as below : !is.na(haz) and agedays > 61-45 and agedays <=61-15, haz_1.5 will hold the value of haz

The new data set should look like the following (except bmi_1.5, wtkg_1.5, and htcm_1.5 are omitted from the output below, so table sample can fit in box):

pid         agedays    wtkg    htcm    bmi     haz    waz    whz    haz_1.5    waz_1.5    whz_1.5
    1       2          1.92    44.2    9.74    -2.72  -3.23  NA     NA         NA         NA          
    1       29         2.68    49.2    11.07   -2.21  -3.03  -2.00  -2.21      -3.03      -2.00              
    1       61         3.63    52.0    13.42   -2.49  -2.62  -0.48  NA         NA         NA       
    1       89         4.11    55.0    13.59   -2.20  -2.70  -1.14  NA         NA         NA 
    2       1          2.40    48.1    10.37   -0.65  -1.88  -2.54  NA         NA         NA          
    2       28         3.78    53.1    13.41   -0.14  -0.58  -0.79  -0.14      -0.58      -0.79
    2       56         4.53    55.2    14.87   -0.68  -0.74  -0.18  NA         NA         NA                
    2       104        5.82    61.3    15.49    0.23  -0.38  -0.70  NA         NA         NA

Here's the code that I've tried so far :

measure<-list("haz", "waz", "whz", "htcm", "wtkg", "bmi")

set_1.5_months <- function(x, y, z){
  maled_anthro[!is.na(z) & agedays > (x-45) & agedays <= (x-15), y:=z]
}

for(i in 1:length(measure)){
  z <- measure[i]
  y <- paste(measure[i], "1.5", sep="_")
  x <- 61
  maled_anthro_1<-set_1.5_months(x, y, z)
}

The code above has not been successful. I just end up with a new variable "y" added into the original data table that holds the values "bmi" or "NA". Can someone help me with figuring out where I went wrong with this code?

I'd like to keep the function as similar to the formatting above (easy to change) as I have other similar functions that will need to be created in which the values "1.5" and x==61 will need to be swapped out with other numbers and I like that these are relatively easy to change in the current format.

bziggy
  • 463
  • 5
  • 19
  • 1
    [How to apply same function to every specified column in a data.table](https://stackoverflow.com/questions/16846380/how-to-apply-same-function-to-every-specified-column-in-a-data-table) – Henrik Jul 29 '20 at 19:29
  • @Henrik this did not answer my question. Thanks for posting though! – bziggy Jul 29 '20 at 22:48

1 Answers1

1

I believe the following is a idiomatic way to create new columns by applying a function to many existing columns.
Note that I've left the condition as it was, negating it all to make the code as close to the question's as possible.

library(data.table)

setDT(maled_anthro)

set_1.5_months <- function(y, agedays, x = 61){
  z <- y
  is.na(z) <- !(!is.na(y) & agedays > (x - 45) & agedays <= (x - 15))
  z
}

measure <- c("haz", "waz", "whz", "htcm", "wtkg", "bmi")
new_measure <- paste(measure, "1.5", sep = "_")

maled_anthro[, (new_measure) := lapply(.SD, function(y) set_1.5_months(y, agedays, x=61)), .SDcols = measure ]
#   pid agedays wtkg htcm   bmi   haz   waz   whz haz_1.5 waz_1.5 whz_1.5 htcm_1.5 wtkg_1.5 bmi_1.5
#1:   1       2 1.92 44.2  9.74 -2.72 -3.23    NA      NA      NA      NA       NA       NA      NA
#2:   1      29 2.68 49.2 11.07 -2.21 -3.03 -2.00   -2.21   -3.03   -2.00     49.2     2.68   11.07
#3:   1      61 3.63 52.0 13.42 -2.49 -2.62 -0.48      NA      NA      NA       NA       NA      NA
#4:   1      89 4.11 55.0 13.59 -2.20 -2.70 -1.14      NA      NA      NA       NA       NA      NA
#5:   2       1 2.40 48.1 10.37 -0.65 -1.88 -2.54      NA      NA      NA       NA       NA      NA
#6:   2      28 3.78 53.1 13.41 -0.14 -0.58 -0.79   -0.14   -0.58   -0.79     53.1     3.78   13.41
#7:   2      56 4.53 55.2 14.87 -0.68 -0.74 -0.18      NA      NA      NA       NA       NA      NA
#8:   2     104 5.82 61.3 15.49  0.23 -0.38 -0.70      NA      NA      NA       NA       NA      NA

Data

maled_anthro <- read.table(text = "
pid     agedays    wtkg    htcm    bmi     haz    waz    whz 
1       2          1.92    44.2    9.74    -2.72  -3.23  NA             
1       29         2.68    49.2    11.07   -2.21  -3.03  -2.00                
1       61         3.63    52.0    13.42   -2.49  -2.62  -0.48        
1       89         4.11    55.0    13.59   -2.20  -2.70  -1.14
2       1          2.40    48.1    10.37   -0.65  -1.88  -2.54          
2       28         3.78    53.1    13.41   -0.14  -0.58  -0.79
2       56         4.53    55.2    14.87   -0.68  -0.74  -0.18                 
2       104        5.82    61.3    15.49    0.23  -0.38  -0.70 
", header = TRUE)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66