2

I wondered if there is a possibility to do this in Sparklyr(or dplyr) without using loops : For an input Spark dataframe, get frequencies of each column by indicating the name of the column.

Here is the input tibble:

> df=data.frame(customer=c("TIM","TAM","TIM"),
          product=c("Banana","Apple","Orange"))
> df=sdf_copy_to(sc,df,"df",overwrite = TRUE)
> df
# Source: spark<df> [?? x 2]
customer product
* <chr>    <chr>  
1 TIM      Banana 
2 TAM      Apple  
3 TIM      Orange  

And the result i'm looking for:

> result
# Source: spark<?> [?? x 3]
# Groups: name
name     value   freq
* <chr>    <chr>  <dbl>
1 product  Apple      1
2 product  Orange     1
3 customer TIM        2
4 product  Banana     1
5 customer TAM        1

Thanks in advance !

10465355
  • 4,481
  • 2
  • 20
  • 44
Greezy
  • 61
  • 4

0 Answers0