Change values in Dataframe with function in R

Question

i have a data frame, which looks like this, but huge so I can't do anything manually:

   Bank  Country  KeyItem    Year    Value 
    A      AU     Income     2010     1000
    A      AU     Income     2011     1130
    A      AU     Income     2012     1160
    B      USA    Depth      2010     10000

What I want to do is create a function where I can select the Bank, the Keyitem and from which year onwards and it returns a dataframe with the values as percentage of the first value . Like this:

   Bank  Country  KeyItem    Year    Value
    A      AU     Income     2010     100
    A      AU     Income     2011     113
    A      AU     Income     2012     116

Thank you in advance!

when you say "from which year onwards" do you mean if you specify 2010, you want values from 2010, 2011, 2012, or just 2010? — GSee, Nov 05 '12 at 18:34

GSee · Answer 1 · 2012-11-06T01:55:03.593

Here's a data.table solution which should be fast and memory efficient.

DF <- read.table(text="Bank  Country  KeyItem    Year    Value 
A      AU     Income     2010     1000
A      AU     Income     2011     1130
A      AU     Income     2012     1160
B      USA    Depth      2010     10000", header=TRUE, stringsAsFactors=FALSE)

library(data.table)
DT <- as.data.table(DF)
setkey(DT, Bank, KeyItem, Year)

DT[J("A", "Income")] #all entries where Bank is "A", and KeyItem is "Income"
DT[J("A", "Income")][Year >= 2010] #only those with year >= your year

DT[J("A", "Income")][Year >= 2010, Value/Value[1]] # result as vector
DT[J("A", "Income")][Year >= 2010, list(Value/Value[1])] # result as data.table

> DT[J("A", "Income")][Year >= 2010, pct:=list(Value/Value[1])] #result as data.table with all columns
   Bank KeyItem Country Year Value  pct
1:    A  Income      AU 2010  1000 1.00
2:    A  Income      AU 2011  1130 1.13
3:    A  Income      AU 2012  1160 1.16

Beasterfield · Answer 2 · 2012-11-05T18:51:35.487

2

I turned to use the plyr package solely for such tasks:

library( "plyr" )

ddply( df, c("Bank", "KeyItem"), function(x) {
  base <- x[ min( x$Year ) == x$Year, "Value" ]
  x$Value <- 100 * x$Value / base
  return( x[ , c("Country", "Year", "Value") ] )
})

edited Nov 05 '12 at 18:51

answered Nov 05 '12 at 18:11

Beasterfield

7,023
2
38
47

+1, but, the OP said the `data.frame` is "huge", so **plyr** may be unacceptably slow. – GSee Nov 05 '12 at 18:13
Happend now one or two more times to me that I posted a straight-forward solution involving plyr and being corrected, that plyr is *so* slow. Well, if it really tends to be *too* slow, the OP may use any solution offered here. But from my experience, most of these solutions are less intuitive to use, involving pitfalls, which to fix a beginner will spend a multiple amount of time, than just waiting until `ddply`will have done the job. – Beasterfield Nov 05 '12 at 18:25
Thank you, the function is really slow. Someone might come up with a faster way, but it was very helpful. – MarMarko Nov 05 '12 at 18:25

Sven Hohenstein · Accepted Answer · 2012-11-05T18:25:22.967

2

Try the following approach: (df is your data frame)

Choose the criteria:

bank <- "A"
keyItem <- "Income"
year <- 2011

Create a subset:

dat <- subset(df, Bank == bank & KeyItem == keyItem & Year >= year)

Calculate percentages:

dat$Value <- dat$Value / dat$Value[1] * 100

As a function:

myfun <- function(df, bank, keyItem, year) {
   dat <- df[df$Bank == bank & df$KeyItem == keyItem & df$Year >= year, ]
   "[[<-"(dat, "Value", value = dat$Value / dat$Value[1] * 100)
}

myfun(df, "A", "Income", 2011)

edited Nov 05 '12 at 18:25

answered Nov 05 '12 at 18:14

Sven Hohenstein

80,497
17
145
168

1

ahhh. don't use `subset` inside a function. See http://stackoverflow.com/questions/9860090/in-r-why-is-better-than-subset – GSee Nov 05 '12 at 18:26
I just realized. This function uses last year as the 100%. I actually want the first year. Thank you anyway. – MarMarko Nov 05 '12 at 18:46
@MarMarko This function *does* use the first year as reference. – Sven Hohenstein Nov 05 '12 at 19:13

Change values in Dataframe with function in R

3 Answers3