The factor()
function can be used to associate a vector of numbers with a set of labels. For example:
x <- c(1,1,1,2,3,3,2,3,4,4)
theLabels <- c("India","Canada","United States","Mexico")
y <- factor(x,1:4,theLabels)
y
produces the following output:
> y <- factor(x,1:4,theLabels)
> y
[1] India India India Canada United States
[6] United States Canada United States Mexico Mexico
Levels: India Canada United States Mexico
To demonstrate that this answer works with the data provided in the fifth edit of the OP:
r <-c("India","Australia","Brazil","Canada","Indonesia","NewZealand",
"Phillipines","Qatar","Singapore","southAfrica","SriLanka","Turkey","UAE","UnitedKingdom","UnitedStates")
zom<- data.frame(Country.Code=c(1,14,30,37,94,148,162,166,184,189,191,208,214,215,216))
zom$Country.Code <- factor(zom$Country.Code,
levels = c(1,14,30,37,94,148,162,166,184,189,191,208,214,215,216),
labels = r)
zom$Country.Code
...and the output:
> zom$Country.Code
[1] India Australia Brazil Canada Indonesia NewZealand Phillipines Qatar
[9] Singapore southAfrica SriLanka Turkey UAE UnitedKingdom UnitedStates
15 Levels: India Australia Brazil Canada Indonesia NewZealand Phillipines Qatar Singapore southAfrica SriLanka Turkey ... UnitedStates
NOTE: Once the original codes are converted to a factor, the underlying codes are lost because a side effect of factor is that the factor levels become an ordered list from 1 to the number of unique labels associated with the factor.
An alternative to the factor()
approach is to create a lookup table of country names and codes, and to merge this with the original data. This approach preserves the original values of Country.Code
.
To illustrate, we'll create a data frame containing multiple rows of Country.Code
from the OP, and merge it with a lookup table via dplyr::inner_join()
. We'll then generate a cross-tab of Country.Name
and Country.Code
to illustrate accuracy of the join process.
library(dplyr)
# first, build a data frame containg multiple rows with same country code
zom<- data.frame(Country.Code=c(1,14,30,37,94,148,162,166,184,189,191,208,214,215,216,
1,14,30,37,94,148,162,166,184,189,191,208,214,215,216,
1,14,30,37,94,148,162,166,184,189,191,208,214,215,216))
# second, create lookup table of codes and names, one row per country
countryNames <- data.frame(Country.Code=c(1,14,30,37,94,148,162,166,184,189,191,208,214,215,216),
Country.Name= c("India","Australia","Brazil","Canada","Indonesia","NewZealand",
"Phillipines","Qatar","Singapore","southAfrica","SriLanka","Turkey","UAE","UnitedKingdom","UnitedStates"),
stringsAsFactors=FALSE)
# use dplyr::inner_join() to join country names
mergedData <- zom %>% inner_join(countryNames)
table(mergedData$Country.Name,mergedData$Country.Code)
...and the output:
> table(mergedData$Country.Name,mergedData$Country.Code)
1 14 30 37 94 148 162 166 184 189 191 208 214 215 216
Australia 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0
Brazil 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0
Canada 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0
India 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Indonesia 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0
NewZealand 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0
Phillipines 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0
Qatar 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0
Singapore 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0
southAfrica 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0
SriLanka 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0
Turkey 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0
UAE 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0
UnitedKingdom 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0
UnitedStates 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3
>