Grouping observations and calculating Z scores

Question

I am currently working on a climate data set and have two main questions that I haven't been able to resolve.

Is there a way to melt the season field before its corresponding column so that it yields something like this https://i.stack.imgur.com/MMcYH.jpg I would like to generate a new class in the season column named "grow" which contains the sum of ppt and the mean for every other parameter for the spring and summer months. I originally tried to have prism_grouped with a column for the year and each other column and individual observation i.e(spring_ppt_mm,summer_ppt_mm,fall_ppt_mm,winter_ppt_mm, ...) and calculating it from there using mutate but melting and gathering the data always gave me wonky results.
When trying to calculate the z score for each season I get NaN filled on the output database when I use this approach:

spring <- prism_grouped %>%

filter(season == "spring") %>%

mutate(z_ppt_mm = scale(ppt_mm)) %>%

mutate(z_tmin_c = scale(tmin_c)) %>%

mutate(z_tmean_c = scale(tmean_c)) %>%

mutate(z_tmax_c = scale(tmax_c)) %>%

mutate(z_vdpmin_hpa = scale(vdpmin_hpa)) %>%

mutate(z_vdpmax_hpa = scale(vdpmax_hpa))

but get a valid result if I do the following:

spring <- filter(prism_grouped,season == "spring")
z_spr_ppt <- scale(spring$ppt_mm)
z_spr_tmin <- scale(spring$tmin_c)
z_spr_tmean <- scale(spring$tmean_c)
z_spr_tmax <- scale(spring$tmax_c)
z_spr_vdpmin <- scale(spring$vdpmin_hpa)
z_spr_vdpmax <- scale(spring$vdpmax_hpa)

I currently have everything working with the second method but I am trying to reduce the number of variables I am working with and would prefer to contain them in data frames. Any suggestions would be appreciated!

Ronak Shah · Accepted Answer · 2020-08-16T06:12:07.967

2

I don't understand the first question but for second one you can use across (or mutate_at in old dplyr) to apply the same function to multiple columns. Something like :

library(dplyr)

spring <- prism_grouped %>%
  ungroup %>%
  filter(season == "spring") %>%
  mutate(across(ppt_mm:vdpmax_hpa, ~as.numeric(scale(.)), .names = 'z_{col}'))

edited Aug 16 '20 at 06:12

answered Aug 16 '20 at 05:45

Ronak Shah

377,200
20
156
213

That's good to know thanks! The results that come out of that snippet still yield NaN. For my first question I just want to calculate a new value "grow" based on the values of spring and summer. My fist approach was to try and melt the data to have each season followed by the parameter name (i.e spring_ppt_mm) but was unsuccessful. I figured that with each column with that format I could just mutate(grow_ppt_mm, sum(spring_ppt_mm + summer_ppt_mm) does that make sense? – k3r0 Aug 16 '20 at 05:49
1

It works as expected for me (without `NA`/`NaN`s) for the data that you have shared. Can you restart R and try it again. – Ronak Shah Aug 16 '20 at 05:51
I tried restarting it but still get the same error: script: https://ufile.io/l8hsao2b source: https://ufile.io/3ix162h9 – k3r0 Aug 16 '20 at 06:00
1

I can't reproduce it. Check the class of columns. What does `sapply(prism_grouped, class)` return? Are all the columns numeric except first 2 columns? – Ronak Shah Aug 16 '20 at 06:02
Yup: > sapply(prism_grouped, class) year season ppt_mm tmin_c tmean_c tmax_c vdpmin_hpa vdpmax_hpa "character" "character" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" – k3r0 Aug 16 '20 at 06:04
1

There is an issue with the code that you are using to create `prism_grouped` variable. If you use `spring <- prism %>% filter(season == "spring") %>% mutate(across(ppt_mm:vdpmax_hpa, scale, .names = 'z_{col}'))` it doesn't give any `NA` values. Your data is grouped in `prism_grouped`. See updated answer. Please make sure you show the exact code that you are running. – Ronak Shah Aug 16 '20 at 06:09
The only issue is that if I use prism and not grouped by year I get 3 values per year :/ – k3r0 Aug 16 '20 at 06:12
Following up on the first question this is what I have made manually previously but I am trying to avoid using Excel at all https://imgur.com/MURQeeS – k3r0 Aug 16 '20 at 06:13
would this work: spring <- prism_grouped %>% filter(season == "spring") %>% mutate(across(ppt_mm:vdpmax_hpa, scale, .names = 'z_{col}')) %>% rename(across(ppt_mm:vdpmax_hpa, .names = 'spring_{col}')) and then merging them cbinding them all? – k3r0 Aug 16 '20 at 06:52
It is better to ask only one question per post. If you want to rename the variables you can use `rename_with(~paste0('spring_', .), ppt_mm:vdpmax_hpa)` – Ronak Shah Aug 16 '20 at 07:00
Yeah I think the question was a bit all over the place, thanks for the pointers tho! I was looking for a simple solution but just ended up turning the grouped_prism into a df, then filtering by season to changing the name/adding the z finally cbinding it all by the year – k3r0 Aug 16 '20 at 07:06

Grouping observations and calculating Z scores

1 Answers1