0

I have four .csv files containing lists of words in order of decreasing frequency. The number of words in each file is different (e.g. file 1 has 300 words, file 2 has 120 words..., and the first word in file 1 is the second word in file 2 and last one in file 4 due to the frequency order). I want to compare all the files together in R to see which words that all the files share and which words are unique in some file. If possible, I also want to record the corresponding frequency of the common word in all the files. I have tried 'Waldo' package in R to compare the lists but it doesn't work out as expected. I hope to receive advice or any suggestions from you all. Thank you.

  • Welcome to the site! Please see the FAQ on [How to make a great reproducible example in R](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). If you can share a few rows of sample data and show your desired output for that sample input, that will help us know what you're dealing with, show where you're stuck, and make it clear what your goal is. – Gregor Thomas Jun 08 '23 at 20:31
  • You might also benefit from reading some of the [general site FAQ](https://stackoverflow.com/help/). We generally try for clear, reproducible questions rather than discussion/advice/suggestions. Almost all good questions in the R tag include sample data, code that was attempted, and the desired result. – Gregor Thomas Jun 08 '23 at 20:34
  • With the caveat of lacking a reproducible example, this seems like a classic use-case for the `tidyverse` package, which you can read all about [here](https://r4ds.had.co.nz/). Read each file into a two column dataframe, with variables `filename` and `word`, then combine them into a single long dataframe (`bind_rows`), and then you can do everything you describe. E.g. `group_by(filename) %>% summarize(n=n())` will get you the number of times each word appears. – C. Murtaugh Jun 08 '23 at 20:41
  • 1
    @GregorThomas Thank you for the instruction! Yes I will add a reproducible example to make my question clearer! – user21390049 Jun 08 '23 at 20:53
  • @C.Murtaugh Many thanks for your suggestion! I will try this then! – user21390049 Jun 08 '23 at 20:55

0 Answers0