Count comma-separated characters as factors in R

Question

I have a dataset with a variable called name_trackers in the variable are up to 30 different strings with names of different trackers which are separated using commas. In total there are 405 different trackers whose occurrence has been stored as a name in the above variable. I would like to determine the frequency of the trackers. Does anyone have an approach on how I can do this?

Textformat of the variable in as comma-separated string:

name_trackers         <chr> "Flurry,AppsFlyer,Twitter MoPub,Google DoubleClick,AppLovin,Google Analyt~

My output could be an new Dataframe with 405 rows with the first column named by the trackers_names and the second column should be the number of times the name occurs in the "old" Dataframe with 4662 rows

Do not share an image. Share the text format of your data. Something we could copy paste. Also you should show what you want to achieve. ie include the expected output — Onyambu, May 30 '21 at 16:48
Further, what is your expected output? Is it 405 columns and (for this example) 6 rows? — r2evans, May 30 '21 at 17:22
Here are some good references for how to frame a question well (on Stack, at least): https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info — r2evans, May 30 '21 at 17:25

ila · Answer 1 · 2021-05-30T16:45:01.777

0

I would approach this by creating dummy variables for the presence of each tracker using

dummy.tracker = grepl("tracker", var)

If you'd like to do this programmatically for each tracker, you could try something like this (reproducible example)

df <- data.frame(
  name_trackers = c("a,b", "a,b,c", "c"),
  stringsAsFactors = F
)

trackers <- unique(unlist(strsplit(df$name_trackers, ",")))

for(tracker in trackers) {
  #To create new dummy variables for each tracker
  df[[tracker]] = grepl(tracker, df$name_trackers)

  #If you're just interested in frequencies
  print(paste0(tracker, ": ", mean(grepl(tracker, df$name_trackers))))
}

edited May 30 '21 at 16:45

answered May 30 '21 at 16:37

ila

709
4
15

[[<-.data.frame`(`*tmp*`, tracker, value = c(TRUE, TRUE, TRUE, : replacement has 4662 rows, data has 404 I recieve this error - you may have an idea do fix it – Paul May 30 '21 at 17:03
It really depends on what you're doing up to that point. Could you share a reproducible example? – ila May 30 '21 at 17:06

Count comma-separated characters as factors in R

1 Answers1