0

I have a list of bigram as specified below:

     test_test_bigram

   1:         I would

   2:      would like

   3:         like to

   4:        to thank

   5:       thank the
  ---                
4792: design features

4793:      features .

4794:        . Return

4795:       Return to

4796:         to text

I have converted the same to a data table , i would want to create a column for frequency of each ngram (each row) . Can someone please suggest

Also, can you please throw some light on how to proceed with sentiment analysis in R in case of Ngrams, i use sentimentr for line wise sentiment analysis and sentimentanalysis for "bag-of-words" approach (single words)

Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
jalaj pathak
  • 67
  • 1
  • 8

1 Answers1

0

You can use tidyverse:

library tidyverse
test_test_bigram %>% distinct() %>% add_count()

in case your bigram dataset has already unique values, you can skip distinct()

Wolfgang Arnold
  • 1,252
  • 8
  • 17
  • thanks for the response, the same is words however there is an issue, the data table is counting both the rows (column 1 (bigram phrase) and column 0 (the index) as 1, hence when running the above the frequency of evrything is coming as 3488 (unique number of bigrams), for example first word in the list is coming as ( 1 "i would"), thus 1 is also being counted. any way to seprate just the phrase and form of table and then run the above analysis – jalaj pathak Feb 14 '20 at 05:29
  • let me re iterate the problem, the bi grams that i get are in the form of tokens (quanteda package), thus is used in the above expression error comes as: Error in UseMethod("distinct_") : no applicable method for 'distinct_' applied to an object of class "tokens" or lists. the same cannot be converted to data frame (i checked). and if i convert the same to a data table then it gives the same frequency for every bigram: – jalaj pathak Feb 14 '20 at 05:56
  • words n 1 like thank 2097, 2 thank organizers 2097, 3 organizers opportunity 2097, 4 opportunity speak 2097, 5 speak today 2097, 6 today plan 2097, – jalaj pathak Feb 14 '20 at 05:57
  • Ah - OK - from the example you provided the structure of your data was not clear. You might need to convert the types of your columns, then you can easily use the functions from tidyverse (dplyr in particular). BTW, as a hint: you might check this for guidance https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Wolfgang Arnold Feb 14 '20 at 06:33
  • thats what I am trying to do (to separate the columns with just phrases), any suggestions on that, i tried stringr function but its still the same – jalaj pathak Feb 14 '20 at 08:36
  • This might help: https://tidyr.tidyverse.org/reference/separate.html – Wolfgang Arnold Feb 14 '20 at 08:48