Create new column for each unique item (word) with frequency count

Question

I am quite new to R and programming in general and have been struggling with the following.

I have a dataframe like below:

id     animals
 1     cat dog
 2     cat pig dog fish fish
 3     horse horse

I want to create a new column for each animal containing the frequency count for each id :

id    cat  dog  fish  horse  pig
 1     1    1     0     0     0
 2     1    1     2     0     1
 3     0    0     0     2     0

How do I achieve this?

example dput:

structure(list(id = 1:3, animals = structure(1:3, .Label = c("cat dog", 
    "cat pig dog fish fish", "horse horse"), class = "factor")), .Names = c("id", 
    "animals"), class = "data.frame", row.names = c(NA, -3L))

Maurits Evers · Accepted Answer · 2018-05-10T13:19:44.633

3

We can do the following:

df %>%
    separate_rows(animals) %>%
    count(id, animals) %>%
    spread(animals, n, fill = 0)
## A tibble: 3 x 6
#     id   cat   dog  fish horse   pig
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1    1.    1.    1.    0.    0.    0.
#2    2.    1.    1.    2.    0.    1.
#3    3.    0.    0.    0.    2.    0.

Sample data

df <- read.table(text =
    "id     animals
 1     'cat dog'
 2     'cat pig dog fish fish'
 3     'horse horse'", header = T)

edited May 10 '18 at 13:19

answered May 10 '18 at 13:12

Maurits Evers

49,617
4
47
68

Instead of mutate and unnest you can use separate_rows – kath May 10 '18 at 13:15
Indeed @kath and thanks! – Maurits Evers May 10 '18 at 13:17
And spread has a fill option, where you can specify `fill = 0` – kath May 10 '18 at 13:18
Right again @kath; it's clearly getting too late here. I should sign off. – Maurits Evers May 10 '18 at 13:19

Mike H. · Answer 2 · 2018-05-10T13:20:48.520

A one-liner with data.table might be:

library(data.table)
dcast(setDT(df)[, unlist(strsplit(as.character(animals), " ")), by = id], id ~  V1)

#  id cat dog fish horse pig
#1  1   1   1    0     0   0
#2  2   1   1    2     0   1
#3  3   0   0    0     2   0

Or as another option you could use dcast in reshape2:

library(reshape2)
spl <- strsplit(as.character(df$animals), " ")
df_m <- data.frame(id = rep(df$id, times = lengths(spl)), animals = unlist(spl))
dcast(df_m, id ~ animals)

score 1 · Answer 3 · answered May 10 '18 at 13:17

You may choose unnest_tokens from tidytext:

library(tidyverse)
library(tidytext)

x %>%  unnest_tokens(word,animals) %>%  table()

Data:

x <- structure(list(id = 1:3, animals = c("cat dog", "cat pig dog fish fish", 
"horse horse")), .Names = c("id", "animals"), row.names = c(NA, 
-3L), class = "data.frame")

OUtput:

   word
id  cat dog fish horse pig
  1   1   1    0     0   0
  2   1   1    2     0   1
  3   0   0    0     2   0

Just on the side note: I love this book , in case you are interested in tidytext analysis, its a must read: https://www.tidytextmining.com/tidytext.html

Create new column for each unique item (word) with frequency count

3 Answers3

Sample data