I'm currently working in a database called "data" that looks like the following and has around 6 thousand observations.
> head(data)
STATE MONTHS_AGE BMI
1 1 122 15.29162
2 1 114 18.58449
3 1 78 16.40888
4 1 113 22.25390
5 1 136 18.25913
6 1 85 14.24700`
Now, I want to measure if the BMI of these people exceeds two standard deviations of their months of age. However, this measurement is highly sensitive, and varies a lot throughout a person's lifespan. To know this, I want to compare if the BMI exceeds the values set by another database, called "zscore".
head(zscore)
A tibble: 168 × 2
Month SD2
<dbl> <dbl>
1 61 18.3
2 62 18.3
3 63 18.3
4 64 18.3
5 65 18.3
6 66 18.4
7 67 18.4
8 68 18.4
9 69 18.4
10 70 18.5
Now, to find out if a person's BMI exceeds two standard deviations, I've tried
data <- data %>%
mutate(obesity = MONTHS_AGE == zscore$Month & BMI > zscore$SD2)`
And I get the following error message
Warning message:
There were 2 warnings in `mutate()`.
The first warning was:
ℹ In argument: `data = MONTHS_AGE == zscore$Month & BMI >
zscore$SD2`.
Caused by warning in `MONTHS_AGE == zscore$Month`:
! longer object length is not a multiple of shorter object length
ℹ Run dplyr::last_dplyr_warnings() to see the 1 remaining
warning.
I know that data frames with different sizes are troublesome to work with. However, how could I work with this data in such a way that I could ask R to pour, per row, whether the row's "MONTHS_AGE" matches a value in the zscore$Month column, and if the value in the data$BMI column of said row exceeds the zscore$SD2 of that chosen month (that is supposedly matching).
Thanks in advance.