1

I have the following code:

library(dplyr)
library(quantmod)

# inflation data
getSymbols("CPIAUCSL", src='FRED')
avg.cpi <- apply.yearly(CPIAUCSL, mean)
cf <- avg.cpi/as.numeric(avg.cpi['1991']) # using 1991 as the base year
cf <- as.data.frame(cf)
cf$year <- rownames(cf)
cf <- tail(cf, 25)
rownames(cf) <- NULL
cf$year <- lapply(cf$year, function(x) as.numeric(head(unlist(strsplit(x, "-")), 1)))
rm(CPIAUCSL)
# end of inflation data get

tmp <- data.frame(year=c(rep(1991,2), rep(1992,2)), price=c(12.03, 12.98, 14.05, 14.58))
tmp %>% mutate(infl.price = price / cf[cf$year == year, ]$CPIAUCSL)

I want to get the following result:

year price
1991 12.03
1991 12.98
1992 13.64
1992 14.16

But I'm getting an error:

Warning message:
In cf$year == tmp$year :
  longer object length is not a multiple of shorter object length

And with %in% it produces and incorrect result.

m0nhawk
  • 22,980
  • 9
  • 45
  • 73
  • 1
    Wondering if you are looking for `%in%` instead of `==` (haven't tested) – David Arenburg Jun 11 '15 at 18:38
  • 2
    Make sure you include all relevant pacakges (`apply.yearly` isn't in base R). Also please share your data in a [reproducible format](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). It's best if we can just copy/paste your code into R to run your code. Ideally include the desired output for the sample input. – MrFlick Jun 11 '15 at 18:40
  • @MrFlick I have updated. – m0nhawk Jun 11 '15 at 18:56
  • Your code `lapply(` was not necessary and if you check the `str(cf)` `year` is a `list`. It would have been easier with `cf$year <- year(as.Date(cf$year))` – akrun Jun 11 '15 at 19:05
  • @akrun I was wondering why ``cf`` was formatted strangely. I had to coerce it into numeric before I could ``dplyr::inner_join`` – divide_by_zero Jun 11 '15 at 19:10
  • @divide_by_zero It was because the `lapply` output will be a list and OP assigned the list as a column – akrun Jun 11 '15 at 19:11
  • Thanks for pointing out some more improvements! – m0nhawk Jun 11 '15 at 19:18

2 Answers2

4

I think it might be easier to join the CPIAUCSL column in cf into tmp before you try to mutate:

cf$year = as.numeric(cf$year)
tmp = tmp %>% inner_join(cf, by = "year") %>% mutate(infl.price = price / CPIAUCSL)
divide_by_zero
  • 997
  • 1
  • 8
  • 20
3

Your cf structure is a list of lists which is unfriendly. It woud have been nicer to have

cf$year <- sapply(cf$year, function(x) as.numeric(head(unlist(strsplit(x, "-")), 1)))

which at least returns a simple vector.

Additional, the subsetting operator [] is not properly vectorized for this type of operation. The mutate() function does not iterate over rows, it operates on entire columns at a time. When you do

cf[cf$year == year, ]$CPIAUCSL

There is not just one year value, mutate is trying to do them all at once.

You'd be better off doing a proper merge with your data and then do the mutate. This will basically do the same thing as your pseudo-merge that you were trying to do in your version.

You can do

tmp %>% left_join(cf) %>% 
    mutate(infl.price = price / CPIAUCSL) %>% 
    select(-CPIAUCSL)

to get

  year price infl.price
1 1991 12.03   12.03000
2 1991 12.98   12.98000
3 1992 14.05   13.63527
4 1992 14.58   14.14962
MrFlick
  • 195,160
  • 17
  • 277
  • 295