0

I have the following dataset

grade9_math_zscore <- rnorm(10, 0,1)
grade9_science_zscore <- rnorm(10, 0,1)
grade10_math_zscore <- rnorm(10, 0,1)
grade10_science_zscore <- rnorm(10, 0,1)
grade9_math_passed_lab<- sample(0:1,10,replace=TRUE)
grade10_math_passed_lab<- sample(0:1,10,replace=TRUE)
grade9_science_passed_lab<- sample(0:1,10,replace=TRUE)
grade10_science_passed_lab<- sample(0:1,10,replace=TRUE)
grade9_math_used_comp  <- sample(0:1,10,replace=TRUE)
grade10_math_used_comp  <- sample(0:1,10,replace=TRUE)
grade9_science_used_comp  <- sample(0:1,10,replace=TRUE)
grade10_science_used_comp  <- sample(0:1,10,replace=TRUE)
students<-as.data.frame(cbind(grade9_math_zscore, grade9_science_zscore, grade10_math_zscore , grade10_science_zscore , grade9_math_passed_lab, grade10_math_passed_lab, grade9_science_passed_lab,  grade10_science_passed_lab, grade9_math_used_comp,  grade10_math_used_comp, grade9_science_used_comp, grade10_science_used_comp ))

The output (first 4 rows) I need to get would look like the following

  grade  course               z_score passed_lab used_comp
1     9    math    -0.287118228740724          0         0
2     9 science     0.421672812450803          0         0
3    10    math      1.66175637068003          1         1
4    10 science -0.000352193924396851          0         1

I have been trying to get this with pivot_longer from dplyr on R. I need help mainly with figuring out the names_pattern option. Plus I can't seem to gather (in dplyr terms) all three columns z_score , passed_lab , used_comp in one command.

Any coding solution or mere suggestions are appreciated. Any solution without using dplyr is also appreciated.

  • Please use `set.seed` when sharing examples with functions such as `rnorm` or `sample`, to ensure reproducibility – Sotos Oct 05 '20 at 09:40

1 Answers1

0

With pivot_longer you can do :

tidyr::pivot_longer(students, 
                    cols = everything(), 
                    names_to = c('grade', 'course', '.value'), 
                    names_pattern = 'grade(\\d+)_(.*?)_(.*)')

# A tibble: 40 x 5
#   grade course  zscore passed_lab used_comp
#   <chr> <chr>    <dbl>      <int>     <int>
# 1 9     math    -1.04           0         1
# 2 9     science  0.608          0         0
# 3 10    math     1.27           0         1
# 4 10    science  1.38           1         1
# 5 9     math    -1.30           1         1
# 6 9     science  0.582          1         1
# 7 10    math    -0.196          1         1
# 8 10    science -0.198          0         1
# 9 9     math    -1.28           1         1
#10 9     science  2.05           0         0
# … with 30 more rows

data

Don't cbind and then add as.data.frame, use data.frame directly to construct the dataframe.

students<-data.frame(grade9_math_zscore, grade9_science_zscore....)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213