I have a problem where my data.frame
consist of different attributes due to different data source. For example, the state
column is actually of the same states but in different representation. Note that my actual data is not using US states.
df <- data.frame(Names=c("Adam", "Mark", "Dahlia", "Jeff", "Derek",
"Arnold", "Sheppard", "Dwayne", "Nichols", "Shane"),
Age=c(27, 28, 29, 37, 26, 22, 29, 34, 31, 30),
States=c("AL", "Alaska", "Alabama", "WI",
"Wisconsin", "AZ", "Arizona", "AL", "WI", "AK"))
I am trying to recode values like AL, WI, AZ, and AK as Alabama, Wisconsin, Arizona, and Alaska respectively.
So far I came across:
case_when(
df$States == "AL" ~ "Alabama",
df$States == "AK" ~ "Alaska",
df$States == "WI" ~ "Wisconsin",
df$States == "AZ" ~ "Arizona",
)
and it gives me output:
[1] "Alabama" NA NA "Wisconsin" NA "Arizona" NA
[8] "Alabama" "Wisconsin" "Alaska"
I don't want the NA
value so what I did is:
case_when(
df$States == "AL" ~ "Alabama",
df$States == "Alabama" ~ "Alabama",
df$States == "AK" ~ "Alaska",
df$States == "Alaska" ~ "Alaska",
df$States == "WI" ~ "Wisconsin",
df$States == "Wisconsin" ~ "Wisconsin",
df$States == "AZ" ~ "Arizona",
df$States == "Arizona" ~ "Arizona",
)
It gives me the output I want but I think there is much more simpler way to do this.
I'm thinking about loop because later I would like to turn it into pseudo-code. However, I'm running out of ideas on how to execute this. Really appreciate everyone helps out here.
Thank you.