I have data that has variables stored in the following format:
V2 V3
1 Price : 33,990 Size : 16, 17 & 18.5"
2 Price : 30,830 Size : 13, 16, 18 & 19.5"
3 Price : 48,560 Sizes : 21 & 21.5"
4 Price : 33,790 Size : 17.5, 18.5, 19.5 & 21.5
5 Price : 37,990 Size : 17.5, 18.5 & 19.5
6 Price : 43,690 Size : 17.5, 18.5 & 19.5"
The variables I need are Price
and Size
among others. What is the cleanest way in R to convert this raw data into a format that looks like this:
Price Size
1 33,990 16, 17 & 18.5"
2 30,830 13, 16, 18 & 19.5"
3 48,560 21 & 21.5"
4 33,790 17.5, 18.5, 19.5 & 21.5
5 37,990 17.5, 18.5 & 19.5
6 43,690 17.5, 18.5 & 19.5"
Also, the variable name for the third row is misspelled as Sizes
instead of Size
. How can I handle this problem as there are other variables with the same error?
Edit:
I cannot use a column specific strategy (for eg. using gsub()
) as the variables in a given column are inconsistent. Concretely,
V20
1 Grips : Bontrager SSR
2 Headset : 1-1/8" threadless
3
4 Brakeset : Tektro alloy linear-pull
5 Brakeset : HL 280 mechanical disc
6 Brakeset : Tektro M290 hydraulic disc brakes
column V20 has 3 unique variables, Grips
, Headset
, Brakeset
and a blank. The tidy dataframe should look something like:
Grips Headset Brakeset
1 Bontrager SSR NA NA
2 NA 1-1/8" threadless NA
3 NA NA NA
4 NA NA Tektro alloy linear-pull
5 NA NA HL 280 mechanical disc
6 NA NA Tektro M290 hydraulic disc brakes
This is an oversimplification as I have assumed that Brakeset
has no value for the first 3 rows. This may or may not be the case as the value may be stored in a different column. If a particular row has no value for a given variable, NA's are to be used. I hope the question is clear.