To expand on my comment on @neilfws's answer (and use his sample data), your data is best shaped in a longer form, which will make the rest of your analysis much easier. However, your data is currently in a wide form such that you have two sets of columns that need to be gathered (melted), which takes a bit more work than your average wide-to-long reshape.
One option is to gather everything and then spread back to wide. This works well, with the one issue that tidyr::spread
is very particular about indices, so you'll have to add a second ID column to identify the rows of the result, and making that column takes a little thought.
library(tidyverse)
df1_tidy <- df1 %>%
gather(var, val, -id) %>% # gather everything to long form
separate(var, c('var', 'var2')) %>% # separate "var*" from dose/unit
group_by(var2) %>%
mutate(var = parse_number(var), # extract var to integer
id2 = seq(n())) %>% # add ID column for spreading
spread(var2, val, convert = TRUE) %>%
select(-id2) # cleanup
## # A tibble: 15 × 4
## id var dose unit
## * <dbl> <dbl> <int> <chr>
## 1 1 1 2 mL
## 2 1 1 1 mL
## 3 1 2 5 mL
## 4 1 2 1 mL
## 5 1 3 1 mL
## 6 1 3 3 mL
## 7 2 1 4 mg
## 8 2 1 3 mL
## 9 2 2 2 mg
## 10 2 2 3 mL
## 11 2 3 4 mg
## 12 2 3 5 mL
## 13 3 1 6 mcg
## 14 3 2 4 mcg
## 15 3 3 2 mcg
Alternately, you can gather
each set separately. The issue with this approach is that it will give you combinations you don't want (var1 and var3, etc.), so you'll have to filter
back to the originals.
df1_tidy <- df1 %>%
gather(var, dose, contains('dose')) %>%
gather(var_unit, unit, contains('unit')) %>%
mutate_at(vars(contains('var')), parse_number) %>% # extract var numbers
filter(var == var_unit) %>% # filter to matching combinations
select(-var_unit) # cleanup
df1_tidy
## id var dose unit
## 1 1 1 2 mL
## 2 2 1 4 mg
## 3 3 1 6 mcg
## 4 1 1 1 mL
## 5 2 1 3 mL
## 6 1 2 5 mL
## 7 2 2 2 mg
## 8 3 2 4 mcg
## 9 1 2 1 mL
## 10 2 2 3 mL
## 11 1 3 1 mL
## 12 2 3 4 mg
## 13 3 3 2 mcg
## 14 1 3 3 mL
## 15 2 3 5 mL
While tidyr does not (yet) have a multi-gather function, requiring approaches like the above, data.table's version of melt
allows you to pass its measure.vars
parameter regex patterns, enabling multi-gather. The grammar looks a lot different and has different things to clean up, but it takes you to the same place:
library(data.table)
dt1 <- melt(setDT(df1),
measure.vars = patterns('dose', 'unit'), # set gathering patterns
variable.factor = FALSE, # because factor numbers are evil
value.name = c('dose', 'unit')) # set column names
dt1 <- dt1[, variable := as.integer(variable)][] # cleanup
dt1
## id variable dose unit
## 1: 1 1 2 mL
## 2: 2 1 4 mg
## 3: 3 1 6 mcg
## 4: 1 1 1 mL
## 5: 2 1 3 mL
## 6: 1 2 5 mL
## 7: 2 2 2 mg
## 8: 3 2 4 mcg
## 9: 1 2 1 mL
## 10: 2 2 3 mL
## 11: 1 3 1 mL
## 12: 2 3 4 mg
## 13: 3 3 2 mcg
## 14: 1 3 3 mL
## 15: 2 3 5 mL
Regardless of the approach you choose, once your data is tidy, combining dose
and unit
is easy:
# base R
df1_tidy$dose_unit <- paste0(df1_tidy$dose, df1_tidy$unit)
# tidyverse
df1_tidy <- df1_tidy %>% mutate(dose_unit = paste0(dose, unit))
# data.table
dt1 <- dt1[, dose_unit := paste0(dose, unit)][]