I have some simple data that I imported from the web that I was using to learn about the fread()
function. It imported fine, and I have a small, clean dataset on the populations of continents:
> continent_populations
Rank Continent Population_2010 Growth_Rate_Percent World_Pop_Percent
1: 1 Asia 4.581.757.408 1.04% 59.69%
2: 2 Africa 1.216.130.000 2.57% 16.36%
3: 3 Europe 738.849.000 0.08% 9.94%
4: 4 North America 579.024.000 0.96% 7.79%
5: 5 South America 422.535.000 1.04% 5.68%
6: 6 Oceania 38.304.000 1.47% 0.54%
7: 7 Antarctica 1.106 0 <0.01%
All of these variables are char
s, but I want to convert the Population_2010
, Growth_Rate_Percent
, and World_Pop_Percent
variables to numerics. I started simply by using transform()
:
transform(continent_populations, Population_2010 = as.numeric(Population_2010))
However, I get the warning that NA
values have been introduced; all of the values are now NA.
I read in this previous thread that, for my Population_2010
variable at least, having comma separators rather than periods might cause an error, so I swapped them for periods:
continent_populations$Population_2010 <- gsub(",", ".", continent_populations$Population_2010)
However, as.numeric()
still converts all the values to NA. For the other two variables, I assume that the percent signs will need to be removed. First and foremost, I'm just confused as to why the Population_2010
variable won't convert. I also tried the suggested as.numeric(as.character(var))
workaround, but this didn't work (and seemed pointless anyway, since it is already character type).
I want to know how to properly convert between types (not just here, but for use in proper datasets), so I need to know what is going wrong here. Thanks for any help.