3

I am new to R and struggling with it a bit. I have a data frame like this

reg     12345
val1    1
val2    0
reg     45678
val1    0
val2    0
val3    1
reg     97654
val1    1
reg     567834
val3    1
reg     567845
val2    0
val4    1

My goal is to transform data into this format into

 reg     val1    val2    val3    val4
 12345   1       0       0       0
 45678   0       0       1       0
 97654   1       0       0       0
 567834  0       0       1       0
 567845  0       0       0       1

Hoping somebody can guide me here. My data source is less than 200 lines and there are no constraints with regards to the approach. Please assume that the machine running had adequate memory and processing power.

Acinonyx
  • 89
  • 1
  • 10
  • You have to transform the data frame from a long format to a wide format. Multiple approaches using tidyr as well as data table can be found here https://stackoverflow.com/questions/30592094/r-spreading-multiple-columns-with-tidyr – Niko Jul 04 '17 at 20:43
  • Possible duplicate of [How can I spread repeated measures of multiple variables into wide format?](https://stackoverflow.com/questions/29775461/how-can-i-spread-repeated-measures-of-multiple-variables-into-wide-format) – Niko Jul 04 '17 at 20:47
  • Maybe [this](https://stackoverflow.com/a/44796994/2204410) can be an inspiration. – Jaap Jul 04 '17 at 21:42

2 Answers2

0

Even if this is a duplicate, I didn't see the following answer, so...start with the original data:

df <- data.frame( A = c("reg","val1","val2","reg","val1","val2","val3","reg","val1","reg","val3","reg","val2","val4"),
                  B = c(12345, 1, 0, 45678, 0, 0, 1, 97654, 1, 567834, 1, 567845, 0, 1))

I use tidyverse verbs, and a trick to add labels (in dummy) to each "reg" group using cumsum:

install.packages("tidyverse")
library(tidyverse)
df1 <- df %>% 
          mutate(dummy = cumsum(A=="reg")) %>%
          group_by(dummy) %>%
          nest() %>%
          mutate(data = map(data, ~spread(.x, A, B))) %>%
          unnest() %>%
          select(-dummy)

This results in:

     reg  val1  val2  val3  val4
1  12345     1     0    NA    NA
2  45678     0     0     1    NA
3  97654     1    NA    NA    NA
4 567834    NA    NA     1    NA
5 567845    NA     0    NA     1

I prefer to keep the NAs, but if you don't:

df1[is.na(df1)] <- 0

     reg  val1  val2  val3  val4
1  12345     1     0     0     0
2  45678     0     0     1     0
3  97654     1     0     0     0
4 567834     0     0     1     0
5 567845     0     0     0     1
CPak
  • 13,260
  • 3
  • 30
  • 48
0

Here is an option using dcast

library(data.table)
dcast(setDT(df), cumsum(A=="reg") ~ A, value.var = "B", fill = 0)[, A := NULL][]
#      reg val1 val2 val3 val4
#1:  12345    1    0    0    0
#2:  45678    0    0    1    0
#3:  97654    1    0    0    0
#4: 567834    0    0    1    0
#5: 567845    0    0    0    1
akrun
  • 874,273
  • 37
  • 540
  • 662