1

i have a data frame, which looks like this, but huge so I can't do anything manually:

   Bank  Country  KeyItem    Year    Value 
    A      AU     Income     2010     1000
    A      AU     Income     2011     1130
    A      AU     Income     2012     1160
    B      USA    Depth      2010     10000

What I want to do is create a function where I can select the Bank, the Keyitem and from which year onwards and it returns a dataframe with the values as percentage of the first value . Like this:

   Bank  Country  KeyItem    Year    Value
    A      AU     Income     2010     100
    A      AU     Income     2011     113
    A      AU     Income     2012     116

Thank you in advance!

MarMarko
  • 35
  • 2
  • 3
  • 9

3 Answers3

3

Here's a data.table solution which should be fast and memory efficient.

DF <- read.table(text="Bank  Country  KeyItem    Year    Value 
A      AU     Income     2010     1000
A      AU     Income     2011     1130
A      AU     Income     2012     1160
B      USA    Depth      2010     10000", header=TRUE, stringsAsFactors=FALSE)

library(data.table)
DT <- as.data.table(DF)
setkey(DT, Bank, KeyItem, Year)

DT[J("A", "Income")] #all entries where Bank is "A", and KeyItem is "Income"
DT[J("A", "Income")][Year >= 2010] #only those with year >= your year

DT[J("A", "Income")][Year >= 2010, Value/Value[1]] # result as vector
DT[J("A", "Income")][Year >= 2010, list(Value/Value[1])] # result as data.table

> DT[J("A", "Income")][Year >= 2010, pct:=list(Value/Value[1])] #result as data.table with all columns
   Bank KeyItem Country Year Value  pct
1:    A  Income      AU 2010  1000 1.00
2:    A  Income      AU 2011  1130 1.13
3:    A  Income      AU 2012  1160 1.16
GSee
  • 48,880
  • 13
  • 125
  • 145
2

I turned to use the plyr package solely for such tasks:

library( "plyr" )

ddply( df, c("Bank", "KeyItem"), function(x) {
  base <- x[ min( x$Year ) == x$Year, "Value" ]
  x$Value <- 100 * x$Value / base
  return( x[ , c("Country", "Year", "Value") ] )
})
Beasterfield
  • 7,023
  • 2
  • 38
  • 47
  • +1, but, the OP said the `data.frame` is "huge", so **plyr** may be unacceptably slow. – GSee Nov 05 '12 at 18:13
  • Happend now one or two more times to me that I posted a straight-forward solution involving plyr and being corrected, that plyr is *so* slow. Well, if it really tends to be *too* slow, the OP may use any solution offered here. But from my experience, most of these solutions are less intuitive to use, involving pitfalls, which to fix a beginner will spend a multiple amount of time, than just waiting until `ddply`will have done the job. – Beasterfield Nov 05 '12 at 18:25
  • Thank you, the function is really slow. Someone might come up with a faster way, but it was very helpful. – MarMarko Nov 05 '12 at 18:25
2

Try the following approach: (df is your data frame)

Choose the criteria:

bank <- "A"
keyItem <- "Income"
year <- 2011

Create a subset:

dat <- subset(df, Bank == bank & KeyItem == keyItem & Year >= year)

Calculate percentages:

dat$Value <- dat$Value / dat$Value[1] * 100

As a function:

myfun <- function(df, bank, keyItem, year) {
   dat <- df[df$Bank == bank & df$KeyItem == keyItem & df$Year >= year, ]
   "[[<-"(dat, "Value", value = dat$Value / dat$Value[1] * 100)
}

myfun(df, "A", "Income", 2011)
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168