3

Hi I am new to R and have a question. I have a data.frame (df) containing about 30 different types of statistics from years 1960-2012 for about 100 different countries. Here is an example of what it looks like:

     Country      Statistic.Type     1960      1961      1962      1963 ...  2012 
__________________________________________________________________________________
1    Albania      Death Rate          10        21        13        24        25  
2    Albania      Birth Rate          7         15        6         10        9  
3    Albania      Life Expectancy     8         12        10        7         20  
4    Albania      Population          10        30        27        18        13
5    Brazil       Death Rate          14        20        22        13        18
6    Brazil       Birth Rate          ...  
7    Brazil       Life Expectancy     ...  
8    Brazil       Population          ...  
9    Cambodia     Death Rate          ...  
10   Cambodia     Birth Rate          ...                  etc...

Note that there are 55 columns in total and the values in each of the 53 year columns are made up for the purposes of this question.

I need help writing a function which takes as inputs the country and statistic type and returns a new data.frame with 2 columns which shows the year and value in each year for a given country and statistic type. For example, if I input country=Brazil and statistic.type=Death Rate into the function, the new data.frame should look like:

     Year    Value 
_____________________
1    1960     14
2    1961     20
3    1962     22
...
51   2012     18

I have no idea on how to do this, if anyone can give me any ideas/code/packages to install then that would be very helpful.

Thank you so much!

  • Please read this and provide a minimal reproducible example of your data http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Dason May 18 '13 at 18:09
  • Hi @user2397274, could you please consider accepting one of the answers below if you believe your question has been satisfactorily answered? :-) – Ferdinand.kraft Jun 30 '13 at 23:03

3 Answers3

1

If df is your data.frame, all you need is this:

f <- function(country, statistic.type, data=df)
{
 values <- data[data$Country==country & data$Statistic.Type==statistic.type,-(1:2)]

 cbind(Year=names(df)[-(1:2)], Value=values)
}

Use it as

f(country="Brazil", statistic.type="Death Rate")
Ferdinand.kraft
  • 12,579
  • 10
  • 47
  • 69
0

You will probably have to do some split operation on the total data set to have country individual datasets. https://stat.ethz.ch/pipermail/r-help/2008-February/155328.html

Then use the melt function for each subset of data. In your case, adapted from http://www.statmethods.net/management/reshape.html, where mydata is the already splitted data:

    % example of melt function 
    library(reshape)
    mdata <- melt(mydata, id=c("Year"))

That is it.

Peter Lustig
  • 941
  • 11
  • 23
0

You could just combine subset with stack, with maybe a gsub in there to leave only the numbers in your column of years:

df <- expand.grid(
  "country" = c("A", "B"),
  "statistic" =  c("c", "d", "e", "f"),
  stringsAsFactors = FALSE)

df$year1980 <- rnorm(8)
df$year1990 <- rnorm(8)
df$year2000 <- rnorm(8)


getYears <- function(input, cntry, stat) {
  x <- subset(input, country == cntry & stat == statistic,
    select = -c(country, statistic))
  x <- stack(x)[,c("ind", "values")]
  x$ind <- gsub("\\D", "", x$ind)
  x
}


getYears(df, "A", "c")

   ind     values
1 1980  1.1421309
2 1990  1.0777974
3 2000 -0.2010913
SchaunW
  • 3,561
  • 1
  • 21
  • 21