Sparklyr/Dplyr frequencies for each categorical column

Asked Mar 07 '19 at 21:49

Active Mar 08 '19 at 12:36

Viewed 30 times

I wondered if there is a possibility to do this in Sparklyr(or dplyr) without using loops : For an input Spark dataframe, get frequencies of each column by indicating the name of the column.

Here is the input tibble:

> df=data.frame(customer=c("TIM","TAM","TIM"),
          product=c("Banana","Apple","Orange"))
> df=sdf_copy_to(sc,df,"df",overwrite = TRUE)
> df
# Source: spark<df> [?? x 2]
customer product
* <chr>    <chr>  
1 TIM      Banana 
2 TAM      Apple  
3 TIM      Orange

And the result i'm looking for:

> result
# Source: spark<?> [?? x 3]
# Groups: name
name     value   freq
* <chr>    <chr>  <dbl>
1 product  Apple      1
2 product  Orange     1
3 customer TIM        2
4 product  Banana     1
5 customer TAM        1

Thanks in advance !

edited Mar 08 '19 at 12:36

10465355

4,481
2
20
44

asked Mar 07 '19 at 21:49

Greezy

Sparklyr/Dplyr frequencies for each categorical column

0 Answers0