0

I'm currently working in a database called "data" that looks like the following and has around 6 thousand observations.

> head(data)
STATE  MONTHS_AGE      BMI
1   1        122      15.29162
2   1        114      18.58449
3   1         78      16.40888
4   1        113      22.25390
5   1        136      18.25913
6   1         85      14.24700`

Now, I want to measure if the BMI of these people exceeds two standard deviations of their months of age. However, this measurement is highly sensitive, and varies a lot throughout a person's lifespan. To know this, I want to compare if the BMI exceeds the values set by another database, called "zscore".

head(zscore)
 A tibble: 168 × 2
     Month   SD2
 <dbl> <dbl>
  1    61  18.3
  2    62  18.3
  3    63  18.3
  4    64  18.3
  5    65  18.3
  6    66  18.4
  7    67  18.4
  8    68  18.4
  9    69  18.4
  10    70  18.5

Now, to find out if a person's BMI exceeds two standard deviations, I've tried

 data <- data %>% 
 mutate(obesity = MONTHS_AGE == zscore$Month & BMI > zscore$SD2)`

And I get the following error message

Warning message:
There were 2 warnings in `mutate()`.
 The first warning was:
  ℹ In argument: `data = MONTHS_AGE == zscore$Month & BMI > 
  zscore$SD2`.
  Caused by warning in `MONTHS_AGE == zscore$Month`:
  ! longer object length is not a multiple of shorter object length
   ℹ Run dplyr::last_dplyr_warnings() to see the 1 remaining 
   warning. 

I know that data frames with different sizes are troublesome to work with. However, how could I work with this data in such a way that I could ask R to pour, per row, whether the row's "MONTHS_AGE" matches a value in the zscore$Month column, and if the value in the data$BMI column of said row exceeds the zscore$SD2 of that chosen month (that is supposedly matching).

Thanks in advance.

  • 4
    I think the typical way to do this would be something like `data %>% left_join(zscore, by = c("MONTHS_AGE" = "Month")) %>% mutate(obesity = BMI > SD2)` – Jon Spring Apr 04 '23 at 00:33
  • 1
    The notion of "merge/join" is often a big thing to learn in the topic of data management. I strongly encourage you to really take on board @JonSpring's suggested code, as it is the best (imho) approach: it is flexible, scalable, and easily maintained when the data changes shape or size. – r2evans Apr 04 '23 at 00:59

1 Answers1

0

Join the tables, compute the required column and drop unnecessary columns:

library(dplyr)

data %>%
  left_join(zscore, by = c("MONTHS_AGE" = "Month")) %>%
  mutate(obseity = BMI > SD2) %>%
  select(-SD2)
Santiago
  • 641
  • 3
  • 14