2

I'd like to make some other columns by doing group_by in R.

When the original table is like below

 userID   cat1    cat2
    a        f       3
    a        f       3
    a        u       1
    a        m       1
    b        u       2
    b        m       1
    b        m       2

I group them by userID and want it come like

userID   cat1_f  cat1_m  cat1_u  cat2_1  cat2_2  cat2_3
a        2       1       1       2       0       1
b        0       2       1       1       2       0
user9191983
  • 505
  • 1
  • 4
  • 20

2 Answers2

2

We could gather all the values then count them, create a new column by pasteing cat and value values and then spread it back to wide format with fill=0.

library(tidyverse)

df %>%
  gather(cat, value, -userID) %>%
  count(userID, cat, value) %>%
  unite(cat, c(cat, value)) %>%
  spread(cat, n, fill = 0)

#  userID cat1_f cat1_m cat1_u cat2_1 cat2_2 cat2_3
#  <fct>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#1  a          2      1      1      2      0      2
#2  b          0      2      1      1      2      0
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you very much! I go with it and get an error like `Warning message: attributes are not identical across measure variables; they will be dropped` . I don't want any row dropped. What do I do then?? – user9191983 Feb 05 '19 at 04:12
  • @user9191983 that's a warning message. It is because the `cat1` column (in this example) is a factor. If you convert it to character then that warning is gone. – Ronak Shah Feb 05 '19 at 04:17
  • Thanks a lot and I made another question asking the same (`https://stackoverflow.com/questions/54528876/pandas-hwo-to-groupby-create-other-columns-by-counting-values-of-existing-column`)but for python pandas this time. Your help is expected if you're familiar with it. – user9191983 Feb 05 '19 at 06:36
1

We can just use table from base R

table(df)
#       cat1
#userID f m u
#     a 2 1 1
#     b 0 2 1

Or with dcast from data.table

library(data.table)
dcast(setDT(df), userID ~ paste0('cat1_', cat1))

data

df <- structure(list(userID = c("a", "a", "a", "a", "b", "b", "b"), 
cat1 = c("f", "f", "u", "m", "u", "m", "m")), class = "data.frame", 
 row.names = c(NA, -7L))
akrun
  • 874,273
  • 37
  • 540
  • 662