0

I am working with the R programming language.

Suppose I have the following data:

my_data <- data.frame(

"id" = c("1", "1", "1", "1", "2", "2", "2", "2" ),
"name" = c("john", "jason", "jack", "jim", "john", "jason", "jack", "jim" ),
"points" = c("150", "165", "183", "191", "151", "166", "184", "192"),
"gender" = c("male", "male", "male", "male", "male", "male", "male", "male"),
"country" = c("usa", "usa", "usa", "usa", "usa", "usa", "usa", "usa")
)

#view original data format
 my_data

  id  name points gender country
1  1  john    150   male     usa
2  1 jason    165   male     usa
3  1  jack    183   male     usa
4  1   jim    191   male     usa
5  2  john    151   male     usa
6  2 jason    166   male     usa
7  2  jack    184   male     usa
8  2   jim    192   male     usa

Let's assume that for the above data: "gender" and "country" will always have the same values. Furthermore, these 4 names will always appear together - each time they appear together, the "id" for all of them is the same number. The only number that can change is the number of "points" that they have from iteration to iteration (i.e., their "id").

Here is what I am trying to do:

my_data_1 <- data.frame(

"id" = c("1", "2"),
"john_points" = c("150", "151"),
"jason_points" = c("165", "166"),
"jack_points" = c("183", "184"),
"jim_points" = c("191", "192"),
"gender" = c("male", "male"),
"country" = c("usa", "usa")
)

#view desired data format

  my_data_1
  id john_points jason_points jack_points jim_points gender country
1  1         150          165         183        191   male     usa
2  2         151          166         184        192   male     usa

I found this previous stackoverflow post How to reshape data from long to wide format , where the "data.table" library and the "dcast" function can be used to solve this kind of problem.

I tried different combinations of the "dcast" function, but I couldn't get the final result to look as desired:

 library(data.table)
 
#attempt 1 : not correct
 setDT(my_data)
dcast(my_data, name ~ points, value.var = c("gender", "country", "id")
)
    name gender_150 gender_151 gender_165 gender_166 gender_183 gender_184 gender_191 gender_192 country_150 country_151 country_165 country_166 country_183 country_184 country_191
1:  jack       <NA>       <NA>       <NA>       <NA>       male       male       <NA>       <NA>        <NA>        <NA>        <NA>        <NA>         usa         usa        <NA>
2: jason       <NA>       <NA>       male       male       <NA>       <NA>       <NA>       <NA>        <NA>        <NA>         usa         usa        <NA>        <NA>        <NA>
3:   jim       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       male       male        <NA>        <NA>        <NA>        <NA>        <NA>        <NA>         usa
4:  john       male       male       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>         usa         usa        <NA>        <NA>        <NA>        <NA>        <NA>
   country_192 id_150 id_151 id_165 id_166 id_183 id_184 id_191 id_192
1:        <NA>   <NA>   <NA>   <NA>   <NA>      1      2   <NA>   <NA>
2:        <NA>   <NA>   <NA>      1      2   <NA>   <NA>   <NA>   <NA>
3:         usa   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>      1      2
4:        <NA>      1      2   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>

#attempt 2 : not correct

 setDT(my_data)
 dcast(my_data, name ~ points, value.var = c("gender", "country"))
    name gender_150 gender_151 gender_165 gender_166 gender_183 gender_184 gender_191 gender_192 country_150 country_151 country_165 country_166 country_183 country_184 country_191
1:  jack       <NA>       <NA>       <NA>       <NA>       male       male       <NA>       <NA>        <NA>        <NA>        <NA>        <NA>         usa         usa        <NA>
2: jason       <NA>       <NA>       male       male       <NA>       <NA>       <NA>       <NA>        <NA>        <NA>         usa         usa        <NA>        <NA>        <NA>
3:   jim       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       male       male        <NA>        <NA>        <NA>        <NA>        <NA>        <NA>         usa
4:  john       male       male       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>         usa         usa        <NA>        <NA>        <NA>        <NA>        <NA>
   country_192
1:        <NA>
2:        <NA>
3:         usa
4:        <NA>

#attempt 3 - not correct:

 setDT(my_data)
dcast(my_data, name ~ points, value.var = c("id"))
    name  150  151  165  166  183  184  191  192
1:  jack <NA> <NA> <NA> <NA>    1    2 <NA> <NA>
2: jason <NA> <NA>    1    2 <NA> <NA> <NA> <NA>
3:   jim <NA> <NA> <NA> <NA> <NA> <NA>    1    2
4:  john    1    2 <NA> <NA> <NA> <NA> <NA> <NA>

Can someone please show me how to fix this? Why are there so many ? Is it possible to have the final table as I had shown (i.e. my_data_1)? Is it possible to rename the variables in the format of name_points (e.g. john_points)?

Thanks

stats_noob
  • 5,401
  • 4
  • 27
  • 83
  • 1
    `dcast(setDT(my_data), id + gender + country ~ name, value.var = 'points')` Or `tidyr::pivot_wider(my_data, names_from = name, values_from = points, names_prefix = 'points_')` – Ronak Shah Jul 04 '21 at 06:50

1 Answers1

1

I would use tidyr as it is really simple to change from a long to a wide format.

library(tidyr)
wide = my_data %>% 
  tidyr::spread(name, points)

Results

  id gender country jack jason jim john
1  1   male     usa  183   165 191  150
2  2   male     usa  184   166 192  151
AndrewGB
  • 16,126
  • 5
  • 18
  • 49
  • thank you so much for your answer! i am using an older computer where there is no internet and no usb port. I only have R with some pre-installed libraries (e.g. base R, dplyr, data.table, reshape2). I do not have any of the "tidy" libraries. Do you know a more "basic" way to solve this problem? Thanks! – stats_noob Jul 04 '21 at 06:47
  • is it possible to add a "gender" column? is it possible to add "_points" beside each persons name? thank you so much for all your help! – stats_noob Jul 04 '21 at 06:48