Cleaning row names in R

Asked Sep 03 '23 at 01:22

Active Sep 03 '23 at 04:35

Viewed 25 times

I have a data-set in excel spreadsheet where the row-names are duplicates or near-duplicates, like shown in the image below. Is there a 'tidy' way (in R) of cleaning this? Even a partial solution like merging different rows with different cases (upper, lower, sentence) would be much appreciated.

I am familiar with clean_names (e.g. https://rpubs.com/jenrichmond/clean_names) but I was not able to find one that does this specific job. I want to create another data-frame where I have just one row for each unique city name.

edited Sep 03 '23 at 04:35

Phil

7,287
3
36
66

asked Sep 03 '23 at 01:22

VitalStatistix

Different cases is easy - convert them all to the same case and do a grouped sum: `your_data |> rownames_to_column("name") |> mutate(name = tolower(name)) |> summarize(Count = sum(Count), .by = name))`. Correcting typos like `ABILENE` vs `ABLIENE` is quite a bit more difficult - it's called "fuzzy deduplication" and [this is a pretty good place to start](https://stackoverflow.com/q/74368289/903061) / potential duplicate for that part of the question. – Gregor Thomas Sep 03 '23 at 02:47

Cleaning row names in R

0 Answers0