-3

I have a text file and I wrote some commands with tm package and found the frequency of whole words. Now I want to have the list of words whose frequencies are one, or two or three respectively. How I can do this?

frequency <- colSums(dtm2)
frequency <- sort(frequency, decreasing=TRUE)
words <- names(frequency)
words
words[1]

As you see the last command returns a word that has highest frequency and in my example this word is "without" but I want the list of words that have not been repeated or have been repeated twice or three times.

TNX

Sowmya S. Manian
  • 3,723
  • 3
  • 18
  • 30
marjan
  • 3
  • 3
  • 3
    *"As you see..."* we don't see anything because you haven't shared any sample data. Take a look at [how to make a reproducible example](http://stackoverflow.com/q/5963269/903061). – Gregor Thomas Jun 06 '16 at 16:51
  • My best guess is that you want `words[frequency == 1]`, say, for words that occur exactly once. – Gregor Thomas Jun 06 '16 at 16:52
  • Hi, Thanks it works well. Sorry for asking simple question. I have clinical background and just started to learn R by myself. – marjan Jun 06 '16 at 18:14
  • The simplicity of the question isn't so much a problem as the lack of reproducibility. – Gregor Thomas Jun 06 '16 at 18:22

2 Answers2

0

In R, the phrase x[x.freq < 4] will return all of the values in x that satisfy x.freq < 4. You'll want to use something like that, perhaps on the variable you called frequency, although you might have to format it a little differently first.

Everyone_Else
  • 3,206
  • 4
  • 32
  • 55
0

I. Vector df stored with words:

 > df <- c("AAA","BB","DD","AA","AAA","CCC","PP","PP","CC","LL","OOO","LL","CC","AAA")
 > df
 # [1] "AAA" "BB"  "DD"  "AA"  "AAA" "CCC" "PP"  "PP"  "CC"  "LL"  "OOO" "LL" 
 # [13] "CC"  "AAA"

II. Table showing frequency of each word:

 > table(df)
 #   df
 #   AA AAA  BB  CC CCC  DD  LL OOO  PP 
 #    1   3   1   2   1   1   2   1   2 

III. Frequency of each word stored in result data frame

 > result <- as.data.frame(table(df))
 > result
 #    df Freq
 # 1  AA    1
 # 2 AAA    3
 # 3  BB    1
 # 4  CC    2
 # 5 CCC    1
 # 6  DD    1
 # 7  LL    2
 # 8 OOO    1
 # 9  PP    2

IV. Ordering words by Decreasing Frequency:

 > result[order(result$Freq,decreasing=T),]
 #    df Freq
 # 2 AAA    3
 # 4  CC    2
 # 7  LL    2
 # 9  PP    2
 # 1  AA    1
 # 3  BB    1
 # 5 CCC    1
 # 6  DD    1 
 # 8 OOO    1

V. Frequency by Specifying Words:

   > result[result$df=="AAA",]
   #    df Freq
   # 2 AAA    3
   > result[result$df=="LL",]
   #    df Freq
   # 1  LL    2
   > result[result$df=="DD",]
   #    df Freq
   # 6  DD    1

VI. Words by Specifying Frequency:

  > unique(df[which(result$Freq == 1)])
  # [1] "AAA" "DD"  "CCC" "PP" 
  > unique(df[which(result$Freq == 2)])
  # [1] "AA" "PP" "CC"
  > unique(df[which(result$Freq == 3)])
  # [1] "BB"
  > unique(df[which(result$Freq == 4)])
  # character(0)
Sowmya S. Manian
  • 3,723
  • 3
  • 18
  • 30
  • Hi, Thanks Sowmaya. It works. Could you plz suggest me any book or resource that I can improve my R knowledge? – marjan Jun 06 '16 at 17:53
  • Hi, there are many books, pdfs I refer, no particular one, although you can find it all those links in here in this link: http://stackoverflow.com/tags/r/info Try practicing by writing small small codes by yourself, you will get to know all the basics faster. – Sowmya S. Manian Jun 06 '16 at 18:02