Grouping word frequency

Question

I'm trying to text mine social policy cases. Each case is in a row and I want to know how many of my cases refer to say Universal Credit or some new unknown issue. I'm starting with word frequencies.

I've got as far as getting my data into this format. Basically ID takes value 1,2 or 3 as there are three case studies. Word takes value of dog or cat.

dd <- read.table(text="ID       Word
1   dog
1   cat
2   cat
2   cat
3   cat", header=TRUE)

I want a count of unique ID for each Word i.e there are three case studies that mention cats

Word Count
cat      3
dog      1

I'm not even sure if this is now a text mining question or whether it's some basic group or count question.

Welcome you to the site! I'd like to applaud you for providing nicely formatted copy/pasteable sample data on your first question. That said, you have been drawing some downvotes (now canceled out by some upvotes). In the future I'd encourage you to also *show what you tried*, which is a nice way to demonstrate the effort you put in yourself before asking the question. In addition to preventing downvotes, this helps answerers see what you do and don't understand so they know where to start from in a good answer. And sometimes all that's needed is a tiny adjustment to code you've already tried. — Gregor Thomas, Jun 20 '18 at 19:17

score 0 · Accepted Answer · answered Jun 20 '18 at 19:15

0

I think you can do this with a simple dplyr call. For example

library(dplyr)
dd %>% group_by(Word) %>% summarize(Count=n_distinct(ID))
#   Word  Count
#    <fct> <int>
# 1 cat       3
# 2 dog       1

answered Jun 20 '18 at 19:15

MrFlick

195,160
17
277
295

score 0 · Answer 2 · answered Jun 20 '18 at 19:17

0

Using base R and not a package,

as.data.frame(table(dd$Word))

answered Jun 20 '18 at 19:17

Lost

331
1
12

This does not return the desired result. It returns 4 for "cat" rather than 3. – MrFlick Jun 20 '18 at 19:36
@MrFlick sorry, I misread the question – Lost Jun 21 '18 at 11:58

Grouping word frequency

2 Answers2