3

I have a rather large dataframe where variables are denominated in annual local currency (in the example below, Australian and Austrian currency):

 Country   Var  _1995       _1996         _1997      _1998
     AUS    GO  1 014 828   1 059 326     1 119 101  1 194 995
     AUS    L   36 873      38 895        39 502     40 425
     AUS    K   41 498      45 008        48 683     47 252
     AUT    GO  289 923     299 487       309 734    323 273
     AUT    GO  8 032       7 849         8 049      7 815
     AUT    L   1 094       1 151         1 163      1 152
     AUT    K   12 032      11 760        11 743     11 611

I want to convert the values in this dataframe into 1995 dollars, using these multipliers:

Country   _1995     _1996     _1997  _1998 
AUS      0,7415   0,78295   0,74406  0,6294
AUT     1,36646   1,30031   1,12904 1,11319

So that for each row which contains the variable AUS in table 1, the value for each year is multiplied by the appropriate $1995 multiplier from the row containing AUS in table 2. The same should apply for each row containing AUT, as well as the 38 other country codes in my dataframe.

So, in the first line I would want R to perform this calculation:

 Country Var    _1995            _1996            _1997           _1998        
    AUS  GO  1014828*0,7415 1059326*0,78295   1119101*0,74406 1194995*0,6294 

And so on. Is this doable? Help much appreciated!

Thomas
  • 43,637
  • 12
  • 109
  • 140
EconGradKI
  • 67
  • 4

4 Answers4

2

I'd suggest reshaping from wide to long format, which will simplify doing this considerably. The reshape is the most complicated part. I show it here using example data and the reshape command but you could also use dplyr or reshape2 or whatever.

Basically, reshape both of your datasets to long, then merge them, perform the multiplication (which is, in long format, just simple vector multiplication), and then reshape back to wide.

Here's the example data (similar to yours):

set.seed(1)
dat <- data.frame(Country = rep(c("AUS", "AUT"), each = 3), 
                  Var = rep(c("GO", "L", "K"), times = 2), 
                  v_1996 = rnorm(6), v_1997 = rnorm(6), v_1998 = rnorm(6),  
                  stringsAsFactors = FALSE)

multipliers <- data.frame(Country = c("AUS", "AUT"), 
                          v_1995 = c(0.7415, 1.36646),
                          v_1996 = c(0.78295, 1.30031),
                          v_1997 = c(0.74406, 1.12904),
                          v_1998 = c(0.6294, 1.11319), stringsAsFactors = FALSE)

And here's the code to do the conversion:

long <- reshape(dat, times = 1996:1998, v.names = "Value", 
                varying = c("v_1996", "v_1997", "v_1998"), 
                direction = "long")
head(long, 3)
#        Country Var time      Value id
# 1.1996     AUS  GO 1996 -0.6264538  1
# 2.1996     AUS   L 1996  0.1836433  2
# 3.1996     AUS   K 1996 -0.8356286  3
# 4.1996     AUT  GO 1996  1.5952808  4

mlong <- reshape(multipliers, times = 1995:1998, v.names = "mult", 
                 varying = c("v_1995","v_1996", "v_1997", "v_1998"), 
                 direction = "long")
head(mlong, 3)
#        Country time    mult id
# 1.1995     AUS 1995 0.74150  1
# 2.1995     AUT 1995 1.36646  2
# 1.1996     AUS 1996 0.78295  1

merged <- merge(long, mlong, by = c("Country", "time"))
merged$converted <- merged$Value * merged$mult    
head(merged, 3)
#   Country time Var      Value id.x    mult id.y  converted
# 1     AUS 1996  GO -0.6264538    1 0.78295    1 -0.4904820
# 2     AUS 1996   L  0.1836433    2 0.78295    1  0.1437835
# 3     AUS 1996   K -0.8356286    3 0.78295    1 -0.6542554

reshape(merged, idvar = c("Country", "Var"), direction = "wide", 
        drop = c("id.x", "id.y","mult"))
#    Country Var Value.1996 converted.1996 Value.1997 converted.1997  Value.1998 converted.1998
# 1      AUS  GO -0.6264538     -0.4904820  0.4874291      0.3626765 -0.62124058    -0.39100882
# 2      AUS   L  0.1836433      0.1437835  0.7383247      0.5493579 -2.21469989    -1.39393211
# 3      AUS   K -0.8356286     -0.6542554  0.5757814      0.4284159  1.12493092     0.70803152
# 10     AUT  GO  1.5952808      2.0743596 -0.3053884     -0.3447957 -0.04493361    -0.05001964
# 11     AUT   L  0.3295078      0.4284623  1.5117812      1.7068614 -0.01619026    -0.01802284
# 12     AUT   K -0.8204684     -1.0668632  0.3898432      0.4401486  0.94383621     1.05066903
Thomas
  • 43,637
  • 12
  • 109
  • 140
  • Thanks. Tried your solution, though I used the melt function in the reshape package rather than the way you did it. – EconGradKI Mar 24 '15 at 12:03
  • Thanks. Tried your solution, though I used the melt function from "reshape" rather than the way you did it. It works fine until I get to the "merged$converted <- merged$Value * merged$mult"-line, when I get the error message: In Ops.factor(merged$Value, merged$mult) : ‘*’ not meaningful for factors Perhaps the problem is obvious, but I'm quite new to this R thing. – EconGradKI Mar 24 '15 at 12:08
  • Check the `class()` of the variables you're trying to multiply. That error means one of them is of class "factor". You'll likely need to modify the variable class to numeric via something like `as.numeric(as.character(variable))`. – Thomas Mar 24 '15 at 16:03
  • Yup. Seems to be some problem with my data (i.e. non-numerical characters in some cells). Your solution is both simple and obvious to the non-initiated, and I've already used some of the code in modified form for other manipulations. Thanks again! – EconGradKI Mar 25 '15 at 11:34
1

Something like this:

(Assuming that your local currency dataframe is called 'local' and the one with the multipliers is named 'conv'.)

#unfactorise Country or you'll get very strange results
local$Country <- as.character(local$Country); conv$Country <- as.character(conv$Country)
countries <- unique(local$Country)
for(i in 1:length(countries)) {
        cy <- countries[i]
        rates <- matrix(conv[conv$Country==cy, -1])
        local[local$Country==cy, -c(1,2)] <- local[local$Country==cy, -c(1,2)] * rates
}
Pixelkracht
  • 274
  • 1
  • 13
1

It would probably be easiest to create a little helper function and then pipe your data. To make it cleaner, setting the row.names of conversions to the Country and removing that column.

df <- read.table(header = TRUE, text = '
                 Country Var  _1995       _1996         _1997      _1998
     AUS   GO  1014828   1059326     1119101  1194995
     AUS   L   36873      38895        39502     40425
     AUS   K    41498     45008        48683     47252
     AUT   GO  289923     299487       309734    323273
     AUT   GO  8032       7849         8049      7815
     AUT   L   1094       1151         1163      1152
     AUT   K   12032      11760        11743     11611
                 ')

conversions <- read.table(header = TRUE, text='
                          Country _1995     _1996   _1997   _1998 
 AUS     0.7415    0.78295 0.74406 0.6294
 AUT     1.36646   1.30031 1.12904 1.11319
                          ')

# the primary code to use
# set row.names, makes indexing cleaner below
row.names(conversions) <- conversions$Country
conversions <- conversions[,-1]

# helper function for conversions
myfun <- function(df1, df2) {
    df1[,3:6] <- df1[,3:6] * df2[df1$Country,]
    df1
}

library(dplyr)
df %>% 
   group_by(Country) %>% 
   do(myfun(., conversions))

Source: local data frame [7 x 6]
Groups: Country

  Country Var     X_1995     X_1996     X_1997     X_1998
1     AUS  GO 752494.962 829399.292 832678.290 752129.853
2     AUS   L  27341.330  30452.840  29391.858  25443.495
3     AUS   K  30770.767  35239.014  36223.073  29740.409
4     AUT  GO 396168.183 389425.941 349702.075 359864.271
5     AUT  GO  10975.407  10206.133   9087.643   8699.580
6     AUT   L   1494.907   1496.657   1313.074   1282.395
7     AUT   K  16441.247  15291.646  13258.317  12925.249
cdeterman
  • 19,630
  • 7
  • 76
  • 100
1

Here is my attempt using dplyr. I was experimenting in various ways and came up with this. I first split the data (i.e., mydf) by Country. For each data frame in the list, I wanted to apply appropriate exchange rates. So, I subsetted the exchange rate data (i.e., rate) using Country and created new data. (When the code is running, R is picking up exchange rates for each country.) I applied my answer in this question in order to calculate multiple columns using mutate_each(). Finally, I used bind_rows() to combine all data frames.

lapply(split(mydf, mydf$Country), function(i) {

        foo <- rate[rate$Country == unique(i$Country),]

        mutate_each(i, funs(. * foo$.), y_1995:y_1998)

    }) %>%
bind_rows

#  Country Var     y_1995     y_1996     y_1997     y_1998
#1     AUS  GO 752494.962 829399.292 832678.290 752129.853
#2     AUS   L  27341.330  30452.840  29391.858  25443.495
#3     AUS   K  30770.767  35239.014  36223.073  29740.409
#4     AUT  GO 396168.183 389425.941 349702.075 359864.271
#5     AUT  GO  10975.407  10206.133   9087.643   8699.580
#6     AUT   L   1494.907   1496.657   1313.074   1282.395
#7     AUT   K  16441.247  15291.646  13258.317  12925.249

DATA

mydf <- structure(list(Country = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
2L), .Label = c("AUS", "AUT"), class = "factor"), Var = structure(c(1L, 
3L, 2L, 1L, 1L, 3L, 2L), .Label = c("GO", "K", "L"), class = "factor"), 
y_1995 = c(1014828, 36873, 41498, 289923, 8032, 1094, 12032
), y_1996 = c(1059326, 38895, 45008, 299487, 7849, 1151, 
11760), y_1997 = c(1119101, 39502, 48683, 309734, 8049, 1163, 
11743), y_1998 = c(1194995, 40425, 47252, 323273, 7815, 1152, 
11611)), .Names = c("Country", "Var", "y_1995", "y_1996", 
"y_1997", "y_1998"), row.names = c(NA, -7L), class = "data.frame")

#  Country Var  y_1995  y_1996  y_1997  y_1998
#1     AUS  GO 1014828 1059326 1119101 1194995
#2     AUS   L   36873   38895   39502   40425
#3     AUS   K   41498   45008   48683   47252
#4     AUT  GO  289923  299487  309734  323273
#5     AUT  GO    8032    7849    8049    7815
#6     AUT   L    1094    1151    1163    1152
#7     AUT   K   12032   11760   11743   11611

rate <- structure(list(Country = structure(1:2, .Label = c("AUS", "AUT"
), class = "factor"), y_1995 = c(0.7415, 1.36646), y_1996 = c(0.78295, 
1.30031), y_1997 = c(0.74406, 1.12904), y_1998 = c(0.6294, 1.11319
)), .Names = c("Country", "y_1995", "y_1996", "y_1997", "y_1998"
), row.names = c(NA, -2L), class = "data.frame")

#  Country  y_1995  y_1996  y_1997  y_1998
#1     AUS 0.74150 0.78295 0.74406 0.62940
#2     AUT 1.36646 1.30031 1.12904 1.11319
Community
  • 1
  • 1
jazzurro
  • 23,179
  • 35
  • 66
  • 76