I went to the World Bank database - and chose 2 files - GDP and Literacy rates. Intuitively I know there may be a correlation. Thus the problem statement is to find the correlation of GDP and Literacy Rates over 60 years for 200 (about) countries.
Here are the links;
http://data.worldbank.org/indicator/NY.GDP.PCAP.CD?view=chart [FOR GDP]
http://data.worldbank.org/indicator/SE.ADT.LITR.ZS?view=chart [FOR LIT]
I got the data in .CSV format and read it after skipping a few lines from the top.
Then, this is the code I started writing;
Lit = read.csv("C:/DIRECTORY/API_SE.ADT.LITR.ZS_DS2_en_csv_v2.csv", skip = 3, header = TRUE, dec = ".")
Gdp = read.csv("C:/DIRECTORY/API_NY.GDP.MKTP.CD_DS2_en_csv_v2.csv", skip = 3, header = TRUE, dec = ".")
#creating a list of variables for each different year
#Without initializing the variables here, the code below did not work
for (i in 5:62)
{
assign(paste0("year", i), 0*i)
}
#running a loop for all the values of each dataset
#The desired result of this in 55 vectors (1 for each year) of some length
(as there are many missing values) which have in them values of gdp and lit
of the same country in the same row
for (y in 5:62){
for (c in 1:264){
#checking if values are available as many values are missing
q = is.na(Gdp[c,y])
r = is.na(Lit[c,y])
#now we will assign the values to the specific year
assign(paste0("year", y), c(Gdp[c,y], Lit[c,y]))
}}
What I get from this is a 55 vectors (titles year1 to year55) with 2 values in each.
I understand that what is happening is for each vector, only the last coexisting values are set (the ones before are replaced by the next and so on and so forth till the last).
Now, What would be ideal, is a way to grow the year vector so that it contains all the coexisting (i.e. when a country, for a given year, has both gdp and lit values) values for a given year.