2
                Julie                           Joe                
                Measurement 1   Measurement 2   Measurement 1   Measurement 2
Part Number 1   33              32              33              31
Part Number 2   34              31              33              32
Part Number 3   33              31              30              31

How do I get this messy Excel table into a tidy format for use in R? I can utilize readr functions like spread() and gather() but this seems to need a higher level of sophistication. Here's my best reconstruction of the Excel data frame for you to mess with. Please modify it at will to better make any point.

library(tidyverse)
messy <- data.frame(
  " " = c(" ", "Part Number 1", "Part Number 2", "Part Number 3"), 
  Julie = c("Measurement 1", 33, 34, 33),
  Julie = c("Measurement 2", 32, 31, 31),
  Joe = c("Measurement 1", 33, 33, 30),
  Joe = c("Measurement 2", 31, 32, 31))

I'm open to trying the dev versions of readr or tidyr. The new pack(), unpack() etc. functions seem to be useful. Also, it appears other SO attempts/solutions aren't quite what I'm looking for 1, 2, 3, 4.

Display name
  • 4,153
  • 5
  • 27
  • 75
  • 1
    I actually asked a similar question recently. Here is the response I got and it worked great for my data. I had to make a few modification since I had ~10 different categories on the top that each spread over 4 years. https://stackoverflow.com/questions/56467814/reshape-dataframe-that-has-years-in-column-names – John Carty Jun 07 '19 at 14:35

1 Answers1

2

The tricky part is getting the correct column names and then performing gather and separate, which it sounds like you're already familiar with. Normally I do not like selecting rows by number, but in this case I think it makes sense if you're reliably reading the data from Excel in this way.

note: If a person's name has a . in it, the last step would fail.

library(tidyverse)

less_messy <- messy %>%
  mutate_all(as.character) %>% 
  set_names(c("Part",
             paste(names(.)[2:ncol(.)],
                   .[1, 2:ncol(.)],
                   sep = "-"))) %>%
  `[`(2:nrow(.), ) %>%
  gather("key", "value", contains("Measurement")) %>%
  separate("key", c("person", "measurement"), sep = "-") %>%
  mutate_at("person", ~ stringr::str_replace(.x, "\\..*",""))

less_messy
#>             Part person   measurement value
#> 1  Part Number 1  Julie Measurement 1    33
#> 2  Part Number 2  Julie Measurement 1    34
#> 3  Part Number 3  Julie Measurement 1    33
#> 4  Part Number 1  Julie Measurement 2    32
#> 5  Part Number 2  Julie Measurement 2    31
#> 6  Part Number 3  Julie Measurement 2    31
#> 7  Part Number 1    Joe Measurement 1    33
#> 8  Part Number 2    Joe Measurement 1    33
#> 9  Part Number 3    Joe Measurement 1    30
#> 10 Part Number 1    Joe Measurement 2    31
#> 11 Part Number 2    Joe Measurement 2    32
#> 12 Part Number 3    Joe Measurement 2    31

data

changed marginally because I noticed a discrepancy between what you showed and your code for the measurement numbers:

messy <- data.frame(
  " " = c(" ", "Part Number 1", "Part Number 2", "Part Number 3"), 
  Julie = c("Measurement 1", 33, 34, 33),
  Julie = c("Measurement 2", 32, 31, 31),
  Joe = c("Measurement 1", 33, 33, 30),
  Joe = c("Measurement 2", 31, 32, 31))
zack
  • 5,205
  • 1
  • 19
  • 25
  • What is `\`[\`(2:nrow(.), )` this line doing? I've never seen such a thing in my several years playing in the tidyverse. I probably need to read a book on base R at some point. Best I can guess is `\`[\`` is short hand for `[]`? – Display name Jun 07 '19 at 15:19
  • 1
    It's just normal subsetting. It looks funny because I need to quote it with back ticks to include it in the pipe steps. E.g., `mtcars[1:5, ]` and `\`[\`(mtcars, 1:5, )` are the same. – zack Jun 07 '19 at 15:22
  • Oah so `\`[\`(2:nrow(.), )` is functionally equivalent to `.[2:nrow(.), ]` in your example? I just never saw that notation `\`[\`` before. Searching `R \`[\`` doesn't yield much either. Thanks – Display name Jun 07 '19 at 16:27