0

I'm trying to text mine social policy cases. Each case is in a row and I want to know how many of my cases refer to say Universal Credit or some new unknown issue. I'm starting with word frequencies.

I've got as far as getting my data into this format. Basically ID takes value 1,2 or 3 as there are three case studies. Word takes value of dog or cat.

dd <- read.table(text="ID       Word
1   dog
1   cat
2   cat
2   cat
3   cat", header=TRUE)

I want a count of unique ID for each Word i.e there are three case studies that mention cats

Word Count
cat      3
dog      1

I'm not even sure if this is now a text mining question or whether it's some basic group or count question.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
Lan
  • 25
  • 2
  • 1
    Welcome you to the site! I'd like to applaud you for providing nicely formatted copy/pasteable sample data on your first question. That said, you have been drawing some downvotes (now canceled out by some upvotes). In the future I'd encourage you to also *show what you tried*, which is a nice way to demonstrate the effort you put in yourself before asking the question. In addition to preventing downvotes, this helps answerers see what you do and don't understand so they know where to start from in a good answer. And sometimes all that's needed is a tiny adjustment to code you've already tried. – Gregor Thomas Jun 20 '18 at 19:17

2 Answers2

0

I think you can do this with a simple dplyr call. For example

library(dplyr)
dd %>% group_by(Word) %>% summarize(Count=n_distinct(ID))
#   Word  Count
#    <fct> <int>
# 1 cat       3
# 2 dog       1
MrFlick
  • 195,160
  • 17
  • 277
  • 295
0

Using base R and not a package,

as.data.frame(table(dd$Word))
Lost
  • 331
  • 1
  • 12