0

In a data frame the word "Tomorrow" is written in several ways. How do I change it all to same?

Now

TOMORROW
2moro
Tomorrow 
tomorrow
tomrow

The result I want

Tomorrow 
Tomorrow 
Tomorrow 
Tomorrow 
Tomorrow
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Reju V S
  • 17
  • 1
  • 5
  • Are this 5 written types all you have in your data set, or will there be more spelling mistakes? – dpendi Apr 05 '21 at 10:24
  • These are the 5 mistakes I came across in one column. I used unique functions to see all the unique values. – Reju V S Apr 05 '21 at 10:32

2 Answers2

0

@Reju: there are many ways to overwrite, replace, etc strings or parts of strings in R. For your case, you can work with a classical if wrong-spelling-condition, then replace with correct-spelling approach.

One way of doing this with R & tidyverse (dplyr) is the case_when() function. I point to this function as your real-world application case might be more difficult and you will have to add more conditions. This also saves you of defining nested ifelse() calls.

I turned your data into a simple dataframe/tibble, i.e. my_df, with one variable WHEN. Note: please also read up on reproducible examples for the future.

With dplyr's mutate, I create a new column, i.e. WHEN_C. Obviously, you can overwrite your existing column ...
case_when() saves you from using many nested ifelse statements, if you have to clean other conditions as well. The TRUE condition at the end of case_when() leaves other values intact. You might need this, if your data has other entries in that column that are correct. The %in% operator allows you to provide a vector of options and eases the construction of a longer value1 OR value2 OR value3 ... conditions statement.

    my_df <- my_df %>% 
    mutate(WHEN_C = case_when(
         WHEN %in% c("TOMORROW","2moro", "Tomorrow","tomorrow","tomrow" ) ~ "tomorrow"
        ,TRUE ~ WHEN
        )
    )

This yields:

enter image description here

Obviously, there are other ways of doing this with string manipulations. They require so-called regular expressions, if you want to read up on this.

Ray
  • 2,008
  • 14
  • 21
0

Try this

typos <-  c("TOMORROW", "2moro", "Tomorrow", "tomorrow", "tomrow")
df <- data.frame(date = typos)
df[df$date %in% typos,] <- "Tomorrow"
Peace Wang
  • 2,399
  • 1
  • 8
  • 15