I've been trying to merge and sort a couple of csv files (links below). I've successfully merged the files and can sort the result manually in excel. But I want to automate this and be able to get the sorted results out.
THE ISSUE In the the last step, I try to convert the factor 'rankingGDP' in the merged DF to be able to sort it in desc order by value. When i assign the result DF to the order function, the values are completely different for rankingGDP for each country. The data has become misaligned. Can anybody tell me what I am doing wrong. Thanks heaps
#Fetch the files
fileUrl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv"
download.file(fileUrl, destfile="./fgdp.csv")
fileUrl <-"https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv"
download.file(fileUrl, destfile="./fed.csv")
#Read the files
fgdp <- read.csv("fgdp.csv",skip = 4, header = T)
fed <- read.csv("fed.csv" ,header = T)
#subset relevant columns
fgdp <- fgdp[,c(1,2,4,5)]
#remove rows that are empty
fed <- fed[rowSums(is.na(fed))<ncol(fed),]
fgdp <- fgdp[rowSums(is.na(fgdp))<ncol(fgdp),]
#name the columns for fgdp to match fed
colnames(fgdp) <- c("CountryCode","rankingGDP",
"Long.Name", "gdp")
#merge the files based on Country Code
dt <- merge(fgdp, fed, by.x ="CountryCode", by.y = "CountryCode", all = TRUE)
#Remove rows where the relevant columns are empty
dt <- dt[!dt$CountryCode=="" ,]
dt <- dt[!(dt$rankingGDP=="" | is.na(dt$rankingGDP)) ,]
#subset the columns used for analysis
dt1 <- dt[,1:4]
#remove NAs
dt1 <- dt1[!(is.na(dt1$rankingGDP)),]
#Convert factor to numeric to be able to sort rankingGDP decending
#THE ISSUE IS HERE WHERE THE result gives me different values for the
#rankingGDP column(2). By that I mean factor numbers(type chars) are not
#converted to the associated number in most cases.
dt1[,2]<- as.numeric(dt1[,2])