0

Trying to summarize a data set, but it does not group the variables specified

Sample from the data set, test2

  newClientID   Month      newApp          count    app
  100           November    R              51       Other
  100           November    Tableau        58       Other
  100           October     R              12       Other
  100           October     Tableau        212      Other
  100           September   R              72       Other
  100           September   Tableau        74       Other
  100           October     SQL Assistant  11       Other
  100           September   SQL Assistant  396      Other

This should summarize the data

test3 <- test2 %>%
   group_by(newClientID, Month, app) %>%
   summarise(total = sum(count)) 

It should be like this

newClientID Month        app    total
100         November     Other  109
100         October      Other  235
100         September    Other  542

But I am getting

newClientID Month        app    total
100         November     Other  109
100         October      Other  224
100         September    Other  146
100         October      Other  11
100         September    Other  396

Why is it nor grouping the Month variable?

alistaire
  • 42,459
  • 4
  • 77
  • 117
user3482393
  • 327
  • 4
  • 14
  • What do you get with `str(test2)` ? – Pete Dec 06 '17 at 19:14
  • 2
    Are you sure that you don't have different spellings (including spaces and carriage returns) in your observations? Try table(dataset$variablename) for each variable, or try a general trimws() on the dataset. – Nicolás Velasquez Dec 06 '17 at 19:22
  • Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 2065 obs. of 5 variables: $ newClientID: chr "02521" "02521" "02521" "03107" ... $ Month : chr "November" "October" "September" "November" ... $ newApp : chr "SQL Assistant" "SQL Assistant" "SQL Assistant" "Cognos" ... $ count : int 7 23 7 1 10 210 195 41 225 450 ... $ app : chr "Other" "Other" "Other" "Cognos" ... - attr(*, "vars")=List of 2 ..$ : symbol newClientID ..$ : symbol Month - attr(*, "drop")= logi TRUE – user3482393 Dec 06 '17 at 19:34
  • Guessing you can't share data. Use the method in this https://stackoverflow.com/a/20760767/2747709 answer to clean your data, the method mentioned under Edit 2017 section – infominer Dec 06 '17 at 20:23
  • What is `unique(test2$Month)` – Alex Dec 06 '17 at 22:25

1 Answers1

0

Thank you. The newClientID had white spaces. I did the following to fix all the columns in the data set:

test2<- data.frame(lapply(test2, function(x) if(class(x)=="character") trimws(x) else(x)), stringsAsFactors=F)
user3482393
  • 327
  • 4
  • 14