2

I have dataset as follow

EstablishmentName                    Freq
bahria university                    20 
bahria university islamabad          12
arid agriculture                     3
arid agriculture university          15
arid rawalpindi                      9
college of e&me, nust                20
college of e & me (nust)             15
college of eme                       30

As you can see above that Bahria University and Bahria University Islamabad are almost same, so goes for other strings. I want to unify them into one such that

Expected Output

EstablishmentName                   Freq
Bahria University                   32
Arid Agriculture                    27
College of EME                      30

I have tried the following solution but it doesn't seems to work.

     library(SnowballC)
     library(dplyr)

    mutate(df, word = wordStem(EstablishmentName)) %>%
      group_by(EstablishmentName) %>%
      summarise(total = sum(Freq))
Rana Usman
  • 1,031
  • 7
  • 21

0 Answers0