tidyverse-dplyr summarise not operating as expected

Question

We're analyzing columns from our SQL Server environment. We're pulling column names and data types. Then we're running a simple pipe argument to see if we have mixed data types for the same column names across disparate tables.

library(tidyverse)
DF = data.frame(COLUMN_NAME = c("PARTYID","PARTYID","AGE","AGE","SALESID","SALES"), 
                DATA_TYPE = c("char","tinyint","int","smallint","varchar","numeric"))
DF %>% group_by(COLUMN_NAME) %>% 
           summarise(mixedTypes = (any(grepl("char", DATA_TYPE)) & 
                                  !(all(grepl("char", DATA_TYPE)))))

All I'm getting back is

  mixedTypes
1       TRUE

But I believe I should be getting back a subset of the data.frame, including both columns with a new column called mixedTypes.

UPDATE: Someone suggested using conflicts and I'm not educated enough to understand how to interpret the detail=TRUE output:

$.GlobalEnv
[1] "df"

$`package:forcats`
[1] "%>%" "%>%" "%>%" "%>%" "%>%"

$`package:purrr`
[1] "%>%"       "%>%"       "compact"   "%>%"       "%>%"       "set_names" "%>%"      

$`package:tidyr`
[1] "%>%"     "%>%"     "%>%"     "%>%"     "extract" "%>%"    

$`package:plyr`
 [1] "compact"     "arrange"     "count"       "desc"        "failwith"    "id"          "mutate"      "rename"      "summarise"  
[10] "summarize"   "is.discrete" "summarize"  

$`package:stringr`
[1] "%>%" "%>%" "%>%" "%>%" "%>%"

$`package:tibble`
 [1] "add_row"       "as_data_frame" "as_tibble"     "data_frame"    "data_frame_"   "frame_data"    "glimpse"       "lst"          
 [9] "lst_"          "tbl_sum"       "tibble"        "tribble"       "trunc_mat"     "type_sum"     

$`package:magrittr`
[1] "%>%"       "%>%"       "%>%"       "%>%"       "extract"   "set_names" "%>%"      

$`package:dplyr`
 [1] "%>%"           "%>%"           "%>%"           "%>%"           "%>%"           "add_row"       "arrange"       "as_data_frame"
 [9] "as_tibble"     "count"         "data_frame"    "data_frame_"   "desc"          "failwith"      "frame_data"    "glimpse"      
[17] "id"            "lst"           "lst_"          "mutate"        "rename"        "summarise"     "summarize"     "tbl_sum"      
[25] "tibble"        "tribble"       "trunc_mat"     "type_sum"      "src"           "summarize"     "coalesce"      "filter"       
[33] "lag"           "intersect"     "setdiff"       "setequal"      "union"        

$`package:Hmisc`
[1] "summarize"   "is.discrete" "src"         "summarize"   "format.pval" "units"      

$`package:ggplot2`
[1] "Position"

$`package:MyPackage`
[1] "coalesce" "HeatMap" 

$`package:stats`
[1] "df"     "filter" "lag"   

$`package:methods`
[1] "body<-"    "kronecker"

$`package:base`
 [1] "body<-"      "format.pval" "HeatMap"     "intersect"   "kronecker"   "Position"    "setdiff"     "setequal"    "union"      
[10] "units"

If i copy/paste the code above, that is not the result i'm getting. Have you tried this in a fresh R session? Anything suspicious in your `conflicts()`? — MrFlick, Jul 02 '18 at 20:02
You're right. Fresh R session and it was fine. I don't know how to interpret the `conflicts()` output... there are multiple pipes (`%>%`) in `forcats`, `purrr`, `tidyr`, `stringr`, `magrittr`, and `dplyr` — quickreaction, Jul 02 '18 at 20:12
Looks you have both `package:plyr` and `package:dplyr` loaded which is usually not a good idea. But if you do need both of them, make sure to load `plyr` before `dplyr`. — MrFlick, Jul 02 '18 at 20:34
Did you load **plyr** before **dplyr**? If so, [this question](https://stackoverflow.com/questions/26923862/why-are-my-dplyr-group-by-summarize-not-working-properly-name-collision-with) is a possible duplicate. — aosmith, Jul 02 '18 at 20:52
plyr was being loaded in a function that was being called earlier in the script. It seems this call superseded the dplyr function. Flag this as a duplicate if you believe it so... — quickreaction, Jul 03 '18 at 13:13

score 1 · Accepted Answer · answered Jul 03 '18 at 04:27

As said in the comments, the problem is that the plyr version of summarise is loaded after dplyr so when you call summarise you are getting the wrong one. You should try to load plyr first (or much better, try not to load it at all), but you can also play safe by being explicit which version of summarise you want.

library(tidyverse)
DF = data.frame(COLUMN_NAME = c("PARTYID","PARTYID","AGE","AGE","SALESID","SALES"), 
                DATA_TYPE = c("char","tinyint","int","smallint","varchar","numeric"))

# bad:
DF %>% group_by(COLUMN_NAME) %>% 
  plyr::summarise(mixedTypes = (any(grepl("char", DATA_TYPE)) & 
                            !(all(grepl("char", DATA_TYPE)))))

# good:
DF %>% group_by(COLUMN_NAME) %>% 
  dplyr::summarise(mixedTypes = (any(grepl("char", DATA_TYPE)) & 
                            !(all(grepl("char", DATA_TYPE)))))

If you really need plyr loaded as well as dplyr it would be a good idea to do it this way, and also with other key conflicts like mutate. But better is to avoid having both loaded at once.

tidyverse-dplyr summarise not operating as expected

1 Answers1