2

this is my first post on this website so - sorry for any mistakes.

Given a data frame

xyz <- data.frame(GEO=c("Belgium", "Belgium", "Belgium", "Germany", "Germany", "Germany", "Italy", "Italy", "Italy"),
                 year=c(2008, 2009, 2010, 2008, 2009, 2010, 2008, 2009, 2010),
                 median=c(10.3, 45.2, 34.2, 67.8, 68.7, 69.9, 53.2, 43.2, 33.3),
                 imigration=c(10.2, 45.2, 47.4, 33.3, 44.4, 55.5, 21.5, 76.5, 78.4))

All I want to do is to count avg index of change for median and imigration.

index=(x[n]/x[1])^(1/(n-1)) 

I tried a lot of solutions but I couldn't find the good one. This is my code - maybe you will use this to help me.

First:

groupColumns = c("GEO","year")
dataColumns = c("median", "imigration")
index_median = ddply(mediana2, groupColumns, function(x) x[c((nrow(x)/x[1])^(1/nrow(x)-1)), ])
head(index_median)

Second:

xyz_index <- aggregate(xyz$median, xyz$imigration by=list(Category=xyz$"GEO/TIME"), FUN=median$index_median[nrow(median)]/median[1])^(1/(nrow(median)-1))

I want to get avg index of change for median and imigration (index_median, index_imigration) for every country (Belgium, Germany, Italy). Something like that (numbers are fake):

   GEO      index_median   index_imigration
1 Germany      2.3            1.3
2 Italy        1.6            1.6
3 Belgium      1.2            2.0

Thank you very much!

  • 1
    Can you edit your question and include your expected output: i.e. what should your result(s) look like? – Phil Mar 27 '19 at 11:06
  • Thank you for your advice, done! –  Mar 27 '19 at 11:10
  • @wloczykijek Welcome to SO I think you should read [this](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Meanwhile, do you want something like `library(dplyr);xyz %>% group_by(GEO) %>% mutate(Ind=((median/first(median))^(1/(row_number()-1))))` – A. Suliman Mar 27 '19 at 11:15
  • 1
    @A.Suliman I edited my post - you can see what exactly I want to get –  Mar 27 '19 at 11:29
  • 1
    Now is better, one last question what `n` stand for in `index=(x[n]/x[1])^(1/(n-1))`, I think it's the last obs e.g. for Belgium's median it's 34.2 – A. Suliman Mar 27 '19 at 11:32
  • @A.Suliman n - number of rows (years) for every variable (median, imigration) for every country. In this task n is equal to 3. It's hard for me to explain it, that's why I'll do it, for example. Let's take median for Belgium: index_median=(x[n]/x[1])^(1/(n-1))=(34.2/10.3)^(1/2) –  Mar 27 '19 at 11:44

2 Answers2

0

We can dplyr::group_by GEO then take the first and last observation for median and immigration using dplyr::first and last in each group. To get number of observations in each group we will use dplyr::n()

library(dplyr)
xyz %>% group_by(GEO) %>% 
        #medain[n()] similar to last(median)
        summarise(Ind_median=((last(median)/first(median))^(1/(n()-1))), 
                  Ind_imigration=((last(imigration)/first(imigration))^(1/(n()-1))))
A. Suliman
  • 12,923
  • 5
  • 24
  • 37
0

You can also use base R only like this:

avgchange <- function(z, fld) { x <- z[,fld]; n<- length(x);(x[n]/x[1])^(1/(n-1))  }

Argument fld is the field name you want to summarise

Use base R's by

z <- by(xyz, list(Category=xyz$GEO), fld="median",FUN=avgchange, simplify=FALSE)
z

And this to get a named vector only

unlist(z)

And similarly for the imigration column

z <- by(xyz, list(Category=xyz$GEO), fld="imigration",FUN=avgchange, simplify=FALSE)
z
unlist(z)
Bhas
  • 1,844
  • 1
  • 11
  • 9