0

Having a dataframe like this:

data.frame(id = c(1,2), text = c("Google,Amazon", "Amazon,Yahoo"), stringsAsFactors = FALSE)

How is it possible from this commands:

library(dplyr)
library(tidyr)

df %>% 
  mutate(
    text = strsplit(text, ","),
    value = 1
    ) %>% 
  unnest(text) %>% 
  pivot_wider(
    id_cols = id,
    names_from = text,
    values_from = value,
    values_fill = list(value = 0)
  )

Receive an output with two columns, one with the colnames and the other with the sum of every column. Expected output:

data.frame(name = c("Google","Amazon","Yahoo"), sum = c(1,2,1))
Nathalie
  • 1,228
  • 7
  • 20
  • 2
    Try `data.frame(table(unlist(strsplit(as.character(df$text), ','))))` – Sotos Jun 23 '20 at 13:16
  • 1
    I do not think that this question should be closed based on the reference given above. – MarBlo Jun 23 '20 at 13:27
  • @MarBlo Why not? It is the same thing. In fact I copy/pasted (and added `data.frame)`my comment from the accepted answer of that link – Sotos Jun 23 '20 at 13:31
  • @Sotos. Because you think it is the same but you have to respect that someone post here a question depending on the problem he/she phases. So as an answer could be the right, as refered in the guides, even if it is not totally fits the question the same should be for a question. – Nathalie Jun 23 '20 at 13:46
  • 4
    @Nathalie First of all I never showed disrespect. This is a big accusation from your part! If you notice, I answered in comments before targeting as dupe AND I gave a **BETTER** dplyr alternative to the answer given to you. So your comment for respect is at the very least unfortunate. – Sotos Jun 23 '20 at 13:50
  • As for hammering, there are rules in stack overflow. If a question has been answered before, then we close them as dupes because they create unnecessary noise to the site making the search for answers impossible. So while you accuse me for lack of respect, you might want to read the rules of the site – Sotos Jun 23 '20 at 13:52
  • 1
    All that needed to be done was `unnest(text) %>% count(text)` instead of `pivot_wider`. This would also mean that `value = 1` is redundant, `count` takes care of that. – Rui Barradas Jun 24 '20 at 07:09

1 Answers1

1

Is this what you want.

library(tidyverse)
df <- data.frame(id = c(1,2), text = c("Google,Amazon", "Amazon,Yahoo"), stringsAsFactors = FALSE)
df
#>   id          text
#> 1  1 Google,Amazon
#> 2  2  Amazon,Yahoo

df %>% 
  separate(col=text, into = c('a', 'b')) %>% 
  pivot_longer(cols = a:b) %>% 
  count(value)
#> # A tibble: 3 x 2
#>   value      n
#>   <chr>  <int>
#> 1 Amazon     2
#> 2 Google     1
#> 3 Yahoo      1
MarBlo
  • 4,195
  • 1
  • 13
  • 27
  • 3
    You can also use `separate_rows` and avoid pivoting to long format. i.e. `df %>% separate_rows(text, sep = ',') %>% count(text)` – Sotos Jun 23 '20 at 13:37