How can i add more columns in dataframe by for loop

Question

I am beginner of R. I need to transfer some Eviews code to R. There are some loop code to add 10 or more columns\variables with some function in data in Eviews.

Here are eviews example code to estimate deflator:

for %x exp con gov inv cap ex im
frml def_{%x} = gdp_{%x}/gdp_{%x}_r*100
next

I used dplyr package and use mutate function. But it is very hard to add many variables.

library(dplyr)
nominal_gdp<-rnorm(4)
nominal_inv<-rnorm(4)
nominal_gov<-rnorm(4)
nominal_exp<-rnorm(4)

real_gdp<-rnorm(4)
real_inv<-rnorm(4)
real_gov<-rnorm(4)
real_exp<-rnorm(4)   

df<-data.frame(nominal_gdp,nominal_inv,
nominal_gov,nominal_exp,real_gdp,real_inv,real_gov,real_exp)

 df<-df %>% mutate(deflator_gdp=nominal_gdp/real_gdp*100,
 deflator_inv=nominal_inv/real_inv, 
 deflator_gov=nominal_gov/real_gov,
 deflator_exp=nominal_exp/real_exp)

 print(df)

Please help me to this in R by loop.

Please provide [reproducible example data](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — zx8754, Aug 21 '19 at 08:55
Good edit. In your current question, it is not obvious what you are trying to automate. For clarity, you should add that you many such `deflatorX` to compute from `nominalX` and `realX` so you want to automate the creation of the many deflators. — asachet, Aug 21 '19 at 09:13
If we know that the columns are ordered as in your example, then maybe try? `df[ 1:2 ] / df[ 3:4 ] * 100` — zx8754, Aug 21 '19 at 09:24
I choose my columns from my data frame by name. Because my variables are not ordered. — Lkhagvasuren, Aug 21 '19 at 09:29

score 2 · Accepted Answer · answered Aug 21 '19 at 09:36

The answer is that your data is not as "tidy" as it could be.

This is what you have (with an added observation ID for clarity):

library(dplyr)

df <- data.frame(nominal_gdp = rnorm(4),
                 nominal_inv = rnorm(4),
                 nominal_gov = rnorm(4),
                 real_gdp = rnorm(4),
                 real_inv = rnorm(4),
                 real_gov = rnorm(4))
df <- df %>%
  mutate(obs_id = 1:n()) %>%
  select(obs_id, everything())

which gives:

   obs_id nominal_gdp nominal_inv nominal_gov    real_gdp   real_inv  real_gov
 1      1  -0.9692060  -1.5223055 -0.26966202  0.49057546  2.3253066 0.8761837
 2      2   1.2696927   1.2591910  0.04238958 -1.51398652 -0.7209661 0.3021453
 3      3   0.8415725  -0.1728212  0.98846942 -0.58743294 -0.7256786 0.5649908
 4      4  -0.8235101   1.0500614 -0.49308092  0.04820723 -2.0697008 1.2478635

Consider if you had instead, in df2:

   obs_id variable        real     nominal
1       1      gdp  0.49057546 -0.96920602
2       2      gdp -1.51398652  1.26969267
3       3      gdp -0.58743294  0.84157254
4       4      gdp  0.04820723 -0.82351006
5       1      inv  2.32530662 -1.52230550
6       2      inv -0.72096614  1.25919100
7       3      inv -0.72567857 -0.17282123
8       4      inv -2.06970078  1.05006136
9       1      gov  0.87618366 -0.26966202
10      2      gov  0.30214534  0.04238958
11      3      gov  0.56499079  0.98846942
12      4      gov  1.24786355 -0.49308092

Then what you want to do is trivial:

df2 %>% mutate(deflator = real / nominal)

   obs_id variable        real     nominal    deflator
1       1      gdp  0.49057546 -0.96920602 -0.50616221
2       2      gdp -1.51398652  1.26969267 -1.19240392
3       3      gdp -0.58743294  0.84157254 -0.69801819
4       4      gdp  0.04820723 -0.82351006 -0.05853872
5       1      inv  2.32530662 -1.52230550 -1.52749012
6       2      inv -0.72096614  1.25919100 -0.57256297
7       3      inv -0.72567857 -0.17282123  4.19901294
8       4      inv -2.06970078  1.05006136 -1.97102841
9       1      gov  0.87618366 -0.26966202 -3.24919196
10      2      gov  0.30214534  0.04238958  7.12782060
11      3      gov  0.56499079  0.98846942  0.57158146
12      4      gov  1.24786355 -0.49308092 -2.53074800

So the question becomes: how do we get to the nice dplyr-compatible data.frame.

You need to gather your data using tidyr::gather. However, because you have 2 sets of variables to gather (the real and nominal values), it is not straightforward. I have done it in two steps, there may be a better way though.

real_vals <- df %>%
  select(obs_id, starts_with("real")) %>%
  # the line below is where the magic happens
  tidyr::gather(variable, real, starts_with("real")) %>%
  # extracting the variable name (by erasing up to the underscore)
  mutate(variable = gsub(variable, pattern = ".*_", replacement = ""))

# Same thing for nominal values
nominal_vals <- df %>%
  select(obs_id, starts_with("nominal")) %>%
  tidyr::gather(variable, nominal, starts_with("nominal")) %>%
  mutate(variable = gsub(variable, pattern = ".*_", replacement = ""))

# Merging them... Now we have something we can work with!
df2 <-
  full_join(real_vals, nominal_vals, by = c("obs_id", "variable"))

Note the importance of the observation id when merging.

Thank you so much. But my data have many observations. What about loop that operation? — Lkhagvasuren, Aug 21 '19 at 10:40
I am not sure what you mean because this code can process thousands of real_XXX and nominal_XXX in one go, and the number of rows is not a problem (as long as it stays below ~ 100 million). What do you mean by "many observations"? — asachet, Aug 21 '19 at 10:44
I just telling about i use this eviews loop so many times and create so many columns. But it is so easy to add columns by loops in eviews. — Lkhagvasuren, Aug 21 '19 at 12:48
In that case the [answer from Suliman](https://stackoverflow.com/a/57589833/3498910) is what you need. You can use `deflator_fun` in a loop. I would use `Reduce` instead of a loop though. — asachet, Aug 21 '19 at 13:14

score 0 · Answer 2 · answered Aug 21 '19 at 09:39

0

We can grep the matching names, and sort:

x <- colnames(df)
df[ sort(x[ (grepl("^nominal", x)) ]) ] /
  df[ sort(x[ (grepl("^real", x)) ]) ] * 100

Similarly, if the columns were sorted, then we could just:

df[ 1:4 ] / df[ 5:8 ] * 100

answered Aug 21 '19 at 09:39

zx8754

52,746
12
114
209

A. Suliman · Answer 3 · 2019-08-21T11:39:31.497

We can loop over column names using purrr::map_dfc then apply a custom function over the selected columns (i.e. the columns that matched the current name from nms)

library(dplyr)
library(purrr)
#Replace anything before _ with empty string
nms <- unique(sub('.*_','',names(df)))
#Use map if you need the ouptut as a list not a dataframe
map_dfc(nms, ~deflator_fun(df, .x))

Custom function

deflator_fun <- function(df, x){
  #browser()
  nx <- paste0('nominal_',x)
  rx <- paste0('real_',x)  
  select(df, matches(x)) %>% 
    mutate(!!paste0('deflator_',quo_name(x)) := !!ensym(nx) / !!ensym(rx)*100)
}
#Test
deflator_fun(df, 'gdp')
      nominal_gdp     real_gdp deflator_gdp
1  -0.3332074  0.181303480   -183.78433
2  -1.0185754 -0.138891362    733.36121
3  -1.0717912  0.005764186 -18593.97398
4   0.3035286  0.385280401     78.78123

Note: Learn more about quo_name, !!, and ensym which they are tools for programming with dplyr here

How can i add more columns in dataframe by for loop

3 Answers3