0

I have a data set like

id  age edu blood       
1   30-39   Primary 5.5     
1   20-29   Secondary   8.7     
1   30-39   Primary 10      
2   30-39   Primary 11      
2   20-29   Secondary   10      
2   20-29   Secondary   9       

I want id wise output like this:

id  age30_39count   age20_29count   edu_pri_count   edu_sec_count   blood_median
1   2   1   2   1   8.7
2   1   2   1   2   10

I have tried R code:

library(dplyr)
library(tidyr)

ddply(dat, "id", spread, age, age, edu, edu, blood, blood_median=median(blood))

But it not showing desired result. Could anybody do help?

Rudro88
  • 231
  • 3
  • 15
  • 1
    @aspiringurbandatascientist *Reshape* is somewhat misleading here, as the OP wants to aggregate the columns by `id` in different ways: tabulate `age` and `edu` and compute median of `blood`. I am not sure if the linked question covers this case. – Uwe Apr 14 '18 at 23:29

1 Answers1

2

You mean like this?

> library(dplyr)
> library(tidyr)
> group_by(df,id,age) %>% gather(variable,value,age,edu) %>% 
    unite(tag,variable,value) %>% 
    mutate(medblood=median(blood)) %>% 
    spread(tag,id) %>% select(-blood) %>% 
    select(-medblood,medblood)
# A tibble: 6 x 5
  `age_20-29` `age_30-39` edu_Primary edu_Secondary medblood
        <int>       <int>       <int>         <int>    <dbl>
1          NA           1           1            NA     8.70
2           1          NA          NA             1     8.70
3           2          NA          NA             2    10.0 
4          NA           1           1            NA     8.70
5           2          NA          NA             2    10.0 
6          NA           2           2            NA    10.0 

That last select(-medblood,medblood) moves the median blood column to the far right. You might possibly be wanting to do this though:

> group_by(df,id,age) %>% gather(variable,value,age,edu) %>% 
    unite(tag,variable,value) %>% 
    mutate(medblood=median(blood)) %>% 
    count(medblood,id,tag) %>% spread(tag,n)
# A tibble: 2 x 6
# Groups:   id [2]
     id medblood `age_20-29` `age_30-39` edu_Primary edu_Secondary
  <int>    <dbl>       <int>       <int>       <int>         <int>
1     1     8.70           1           2           2             1
2     2    10.0            2           1           1             2

Here is the dput of the data df used for this example:

> dput(df)
structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L), age = structure(c(2L, 
1L, 2L, 2L, 1L, 1L), .Label = c("20-29", "30-39"), class = "factor"), 
edu = structure(c(1L, 2L, 1L, 1L, 2L, 2L), .Label = c("Primary", 
"Secondary"), class = "factor"), blood = c(5.5, 8.7, 10, 
11, 10, 9)), .Names = c("id", "age", "edu", "blood"), class = "data.frame", row.names = c(NA, 
-6L))
mysteRious
  • 4,102
  • 2
  • 16
  • 36
  • Thanks. Your solution is very close. But how can I keep only unique rows with all information? – Rudro88 Apr 14 '18 at 05:47
  • After running your code, I have got these results: # A tibble: 5 x 5 `age_20-29` `age_30-39` edu_Primary edu_Secondary medblood 1 NA 1 1 NA 9.50 2 1 NA NA 1 9.50 3 2 NA NA 2 9.50 4 2 1 1 2 9.50 5 NA 2 2 NA 9.50 – Rudro88 Apr 14 '18 at 06:00
  • Thanks a ton! Its exactly what I am exploring. But after running your code gives error message: "Error in count(., medblood, id, tag) : unused argument (tag)" – Rudro88 Apr 14 '18 at 06:24
  • 2
    I just `dput` the version of your sample data I used (cut and paste from your question). Try to run the code on this in case there is issue with data types of structures. – mysteRious Apr 14 '18 at 15:03