I have a data frame like this:
levels<- c("level 1", "LEVEL 1", "Level 1 ", "Level I", "Level I ",
"level one", "Level one", "Level One", "Level 1")
df<- as.data.frame(levels)
> df
levels
1 level 1
2 LEVEL 1
3 Level 1 #this one has a space at the end.
4 Level I
5 Level I #this one also has a space at the end.
6 level one
7 Level one
8 Level One
9 Level 1 #this is the correct format I want.
As you can see some of them are in Upper Case format, some of them have a space at the end, some of them mark "1"
as a number, as characters, and even in roman numerals.
I know I can just write multiple lines with gsub()
, but I wanted to find a less tedious way to solve this problem.
This data frame also includes the same issue with level 2, and level 3 (such that "level 2", "level III ", "level II", "Level Two", "level three","Level TWO"
). Moreover, this data also includes strings that are not just "level #" but other strings such as "Level 1 with specifications", "Level 2 with specifications", "Level 3 with specifications", "Level 1 with others included", "Moderate", "Mild", "Severe", etc..
I do not want to replace strings such as ("Level 1 with specifications", "Level 2 with specifications", "Level 3 with specifications", "Level 1 with others included", "Moderate", "Mild", "Severe", etc..
), but want to replace all of the oddly formatted Levels into just "Level 1", "Level 2", "Level 3".
I tried this using apply()
, for loops with gsub()
. However, none of them seems to work. I think this is maybe because gsub()
can't take on a list?
I also wanted to use regular expressions to grab a pattern using str_replace()
, but I can't figure out how to. I have never used str_replace()
and am new to regular expressions.
Any ideas?