0

I have a dataset of patients with several numerical variables including age (in decimal), height, weight, gender, BMI, and triglycerides. I want to create new variables like talla_z, peso_z, trigliceridos_z, which are the z-scores for each variable.

The age values are in decimals, so they need to be converted to match values in my z-score lookup table, e.g., 12.48 should match 12.5, not 12 in the table.

Here's the format for the lookup table for weight (Peso) with median (P50) and standard deviation (DS):

   PESO             

Edad HP50   HDS MP50    MDS
10  36.05   7.32    36.11   6.26
...
17.5    69.25   10.1    58.16   8.3

Here's the format for the patient data:

Edad Peso1  IMC1    Trig1   Talla1
11.43   84  32  22  180
...
17.3    69  25  24  210 

How can I create a function in R to automatically assign z-scores to each individual based on age and gender?

The solution I'm trying is this one.

You can create a function to calculate the z-score for each patient based on their age, gender, and the given variable. You can round the decimal age to the nearest value that exists in the z-score lookup table:

# z-score
calculate_z_score <- function(patients_data, p50_ds_table, variable_name) {
  # new column
  z_column <- numeric(nrow(patients_data))
  
  # Iteration
  for (i in seq_len(nrow(patients_data))) {
    # rounding
    rounded_age <- round(patients_data$Edad[i] * 2) / 2
    
    # match rows
    row_index <- which(p50_ds_table$Edad == rounded_age)
    
    if (patients_data$Sexo[i] == "Hombres") {
      p50 <- p50_ds_table[row_index, "HP50"]
      ds <- p50_ds_table[row_index, "HDS"]
    } else {
      p50 <- p50_ds_table[row_index, "MP50"]
      ds <- p50_ds_table[row_index, "MDS"]
    }
    pd <- patients_data[i, variable_name]
    # z score calc
    z_column[i] <- ( pd - p50) / ds - pd ]
  }
  
  # column in data.frame
  patients_data[paste(variable_name, "_z", sep = "")] <- z_column
  
  return(patients_data)
}

# example
patients_data <- calculate_z_score(patients_data, p50_ds_table, "Peso")

This function will iterate through the patient data and round the age to the nearest value in the p50_ds_table. It then calculates the z-score for the given variable based on the patient's age. But I think it cannot take into account the gender.

When I try it , it says Ops.data.frame(pd,p50) "-" only defined for equally sized dataframes.

When I extract the individual values, patients_data[1,"Peso"] - p50[1,"HP50"] for example, it works.

I've found this solution but I cannot see how to apply it in my example

How can I make this work?

Jorge A
  • 49
  • 9
  • 1
    This appears to be a homework question, so I won't give you a full answer. But I will give you some hints. Don't loop through rows. Join the look up table to the patient data, using the appropriate variables in the join. You haven't given us the patient data table. Seeing would have been helpful. – Limey Aug 02 '23 at 07:16
  • Hello @Limey and thanks for your answer! Its for my job, I found that they used a big chunk of code made with every possibility as an If Else function(If weight = 12, then -). It does not work nowadays so I'm trying to update using R as I'm more used to. I'll try to join the data and show you a example of the patient data. – Jorge A Aug 02 '23 at 07:20
  • I've tried creating a new data frame with only necesarry columns as: Age,Gender,WeightT1... and adding the columns from p50 look up table. The problem is I can't join a data.frame with another when they have different size (51 patients vs 16 posible categories). That's why I tried to create an index to iterate with – Jorge A Aug 02 '23 at 07:34

1 Answers1

0

The solution I found is this one, thanks to @Limey using dplyr:

df <- table %>%
inner_join(reftable,
 by =c("var1"="varr", "var2"="varr",...)

This way, I join the values of the look up table directly with the values of the patients. Afterwards you can mutate a new column directly with the values.

Jorge A
  • 49
  • 9