Take the frequency of sum and col name of a dataframe

Question

Having a dataframe like this:

data.frame(id = c(1,2), text = c("Google,Amazon", "Amazon,Yahoo"), stringsAsFactors = FALSE)

How is it possible from this commands:

library(dplyr)
library(tidyr)

df %>% 
  mutate(
    text = strsplit(text, ","),
    value = 1
    ) %>% 
  unnest(text) %>% 
  pivot_wider(
    id_cols = id,
    names_from = text,
    values_from = value,
    values_fill = list(value = 0)
  )

Receive an output with two columns, one with the colnames and the other with the sum of every column. Expected output:

data.frame(name = c("Google","Amazon","Yahoo"), sum = c(1,2,1))

Try `data.frame(table(unlist(strsplit(as.character(df$text), ','))))` — Sotos, Jun 23 '20 at 13:16
I do not think that this question should be closed based on the reference given above. — MarBlo, Jun 23 '20 at 13:27
@MarBlo Why not? It is the same thing. In fact I copy/pasted (and added `data.frame)`my comment from the accepted answer of that link — Sotos, Jun 23 '20 at 13:31
@Sotos. Because you think it is the same but you have to respect that someone post here a question depending on the problem he/she phases. So as an answer could be the right, as refered in the guides, even if it is not totally fits the question the same should be for a question. — Nathalie, Jun 23 '20 at 13:46
@Nathalie First of all I never showed disrespect. This is a big accusation from your part! If you notice, I answered in comments before targeting as dupe AND I gave a **BETTER** dplyr alternative to the answer given to you. So your comment for respect is at the very least unfortunate. — Sotos, Jun 23 '20 at 13:50
As for hammering, there are rules in stack overflow. If a question has been answered before, then we close them as dupes because they create unnecessary noise to the site making the search for answers impossible. So while you accuse me for lack of respect, you might want to read the rules of the site — Sotos, Jun 23 '20 at 13:52
All that needed to be done was `unnest(text) %>% count(text)` instead of `pivot_wider`. This would also mean that `value = 1` is redundant, `count` takes care of that. — Rui Barradas, Jun 24 '20 at 07:09

score 1 · Accepted Answer · answered Jun 23 '20 at 13:23

1

Is this what you want.

library(tidyverse)
df <- data.frame(id = c(1,2), text = c("Google,Amazon", "Amazon,Yahoo"), stringsAsFactors = FALSE)
df
#>   id          text
#> 1  1 Google,Amazon
#> 2  2  Amazon,Yahoo

df %>% 
  separate(col=text, into = c('a', 'b')) %>% 
  pivot_longer(cols = a:b) %>% 
  count(value)
#> # A tibble: 3 x 2
#>   value      n
#>   <chr>  <int>
#> 1 Amazon     2
#> 2 Google     1
#> 3 Yahoo      1

answered Jun 23 '20 at 13:23

MarBlo

4,195
1
13
27

3

You can also use `separate_rows` and avoid pivoting to long format. i.e. `df %>% separate_rows(text, sep = ',') %>% count(text)` – Sotos Jun 23 '20 at 13:37

Take the frequency of sum and col name of a dataframe

1 Answers1