0

I have this table in R.In this table the data in measure column "Zero-Loss Condensate Drain
Zero Loss Condensate Drain, Zero-Loss Condensate Drains " and "Wi-Fi Thermostat
Wi-Fi thermostats " is same but R treats this different and count differently. I want that Wi-Fi Thermostat and Wi-Fi thermostats should be treated same and give count 4 not 1,2,1 respectively. Similar results I want for Zero Zero Loss Condensate Drain.

measure Freq
Thermostatic Radiator Valves (TRVs) 45
Smart Thermostatic Radiator Enclosure 42
Smart Thermostats 4
Thermostatic radiator valves 3
Wi-Fi Enabled Thermostats 2
Wi-Fi Thermostats 1
Smart Thermostat 2
Thermostatic and Float Steam Traps 1
Thermostatic Radiator Valves 2
Dual Fuel Thermostat 1
Programmable Setback Thermostats 1
Wi-Fi Thermostat 1
Wi-Fi thermostats 2
Zero-Loss Condensate Drain 1
Zero Loss Condensate Drain 1
Zero-Loss Condensate Drains 2

library(dplyr)
library(stringr)
df_NY<-trimws(NY$`Eligible measures`)
df_NY<-gsub("-"," ",df_NY)
stringr::str_to_title(df_NY)
df_NY<-sort(table(measure=unlist(strsplit(df_NY,";"))), decreasing = TRUE)
df_NY<-as.data.frame(df_NY)
df_NY %>% 
  mutate(helper = toupper(measure),
         helper = ifelse(str_ends(helper, 'S'), substring(helper,1, nchar(helper)-1), helper))%>%
  group_by(helper) %>%
  mutate(measure = first(measure)) %>% 
  group_by(measure) %>% 
  summarise(Freq = sum(Freq)) %>% 
  arrange(-Freq)

I ran this code for the above table but not getting desired result. For example ouput is Standard Fryers 4 Steamer 4 Steamers 4 Window\nReplacements 4 Bi level Controls 4 Full Size Convection Ovens 4 Low Flow Pre Rinse Spray Valve 3 Wi Fi Enabled Thermostats 3 Wi Fi Thermostats 3 Thermodynamic Steam Traps 1 Zero Loss Condensate Drain 1 Zero Loss Condensate Drains 2

But I want Steamers and Steamer should be treated as same and give output 8 instead of 4 repectively

user438383
  • 5,716
  • 8
  • 28
  • 43
  • https://stackoverflow.com/questions/20283624/removing-duplicate-words-in-a-string-in-r –  Aug 22 '22 at 11:37
  • [on plurals](https://stackoverflow.com/questions/34938023/r-text-mining-dealing-with-plurals), and you might dput(df_NY) for good measure. – Chris Aug 22 '22 at 14:18
  • Greetings! Typically it is recommended to provide a minimally reproducible dataset with your question. One way of achieving this is by using the `dput` command. You can check out how to do this at this video: https://youtu.be/3EID3P1oisg – Shawn Hemelstrand Aug 30 '22 at 23:04

0 Answers0