I am working with the R programming language.
Suppose I have the following data:
my_data <- data.frame(
"id" = c("1", "1", "1", "1", "2", "2", "2", "2" ),
"name" = c("john", "jason", "jack", "jim", "john", "jason", "jack", "jim" ),
"points" = c("150", "165", "183", "191", "151", "166", "184", "192"),
"gender" = c("male", "male", "male", "male", "male", "male", "male", "male"),
"country" = c("usa", "usa", "usa", "usa", "usa", "usa", "usa", "usa")
)
#view original data format
my_data
id name points gender country
1 1 john 150 male usa
2 1 jason 165 male usa
3 1 jack 183 male usa
4 1 jim 191 male usa
5 2 john 151 male usa
6 2 jason 166 male usa
7 2 jack 184 male usa
8 2 jim 192 male usa
Let's assume that for the above data: "gender" and "country" will always have the same values. Furthermore, these 4 names will always appear together - each time they appear together, the "id" for all of them is the same number. The only number that can change is the number of "points" that they have from iteration to iteration (i.e., their "id").
Here is what I am trying to do:
my_data_1 <- data.frame(
"id" = c("1", "2"),
"john_points" = c("150", "151"),
"jason_points" = c("165", "166"),
"jack_points" = c("183", "184"),
"jim_points" = c("191", "192"),
"gender" = c("male", "male"),
"country" = c("usa", "usa")
)
#view desired data format
my_data_1
id john_points jason_points jack_points jim_points gender country
1 1 150 165 183 191 male usa
2 2 151 166 184 192 male usa
I found this previous stackoverflow post How to reshape data from long to wide format , where the "data.table" library and the "dcast" function can be used to solve this kind of problem.
I tried different combinations of the "dcast" function, but I couldn't get the final result to look as desired:
library(data.table)
#attempt 1 : not correct
setDT(my_data)
dcast(my_data, name ~ points, value.var = c("gender", "country", "id")
)
name gender_150 gender_151 gender_165 gender_166 gender_183 gender_184 gender_191 gender_192 country_150 country_151 country_165 country_166 country_183 country_184 country_191
1: jack <NA> <NA> <NA> <NA> male male <NA> <NA> <NA> <NA> <NA> <NA> usa usa <NA>
2: jason <NA> <NA> male male <NA> <NA> <NA> <NA> <NA> <NA> usa usa <NA> <NA> <NA>
3: jim <NA> <NA> <NA> <NA> <NA> <NA> male male <NA> <NA> <NA> <NA> <NA> <NA> usa
4: john male male <NA> <NA> <NA> <NA> <NA> <NA> usa usa <NA> <NA> <NA> <NA> <NA>
country_192 id_150 id_151 id_165 id_166 id_183 id_184 id_191 id_192
1: <NA> <NA> <NA> <NA> <NA> 1 2 <NA> <NA>
2: <NA> <NA> <NA> 1 2 <NA> <NA> <NA> <NA>
3: usa <NA> <NA> <NA> <NA> <NA> <NA> 1 2
4: <NA> 1 2 <NA> <NA> <NA> <NA> <NA> <NA>
#attempt 2 : not correct
setDT(my_data)
dcast(my_data, name ~ points, value.var = c("gender", "country"))
name gender_150 gender_151 gender_165 gender_166 gender_183 gender_184 gender_191 gender_192 country_150 country_151 country_165 country_166 country_183 country_184 country_191
1: jack <NA> <NA> <NA> <NA> male male <NA> <NA> <NA> <NA> <NA> <NA> usa usa <NA>
2: jason <NA> <NA> male male <NA> <NA> <NA> <NA> <NA> <NA> usa usa <NA> <NA> <NA>
3: jim <NA> <NA> <NA> <NA> <NA> <NA> male male <NA> <NA> <NA> <NA> <NA> <NA> usa
4: john male male <NA> <NA> <NA> <NA> <NA> <NA> usa usa <NA> <NA> <NA> <NA> <NA>
country_192
1: <NA>
2: <NA>
3: usa
4: <NA>
#attempt 3 - not correct:
setDT(my_data)
dcast(my_data, name ~ points, value.var = c("id"))
name 150 151 165 166 183 184 191 192
1: jack <NA> <NA> <NA> <NA> 1 2 <NA> <NA>
2: jason <NA> <NA> 1 2 <NA> <NA> <NA> <NA>
3: jim <NA> <NA> <NA> <NA> <NA> <NA> 1 2
4: john 1 2 <NA> <NA> <NA> <NA> <NA> <NA>
Can someone please show me how to fix this? Why are there so many ? Is it possible to have the final table as I had shown (i.e. my_data_1)? Is it possible to rename the variables in the format of name_points (e.g. john_points)?
Thanks