Loop over multiple data frames

Question

I've got 4 data frame with this structure:

data1:

1.8064 2.2016 2.4506 2.1828 2.1171 1.9308 2.1707 2.1885
2.2310 2.2400 1.9115 2.1527 2.0934 1.7989 2.2144 2.0091
1.9248 2.2038 1.9676 1.9224 1.9502 1.7990 2.0824 2.1300
2.0095 2.0341 1.8433 1.8361 1.9958 1.8243 2.0397 2.0482
2.1143 2.2627 1.7620 1.7561 1.9490 1.9803 1.9336 2.2511
2.2377 2.5414 1.7867 1.6618 2.5090 1.8325 2.0212 2.1616
2.3476 2.1878 2.0469 1.7508 2.2969 1.7939 2.0291 2.0721
2.3534 2.0932 2.3502 1.9960 2.0710 1.9923 1.7787 1.9772
2.2607 2.1504 2.3685 2.1148 2.1961 1.7738 1.8405 2.0135
2.2411 1.9916 2.4726 2.0347 2.0751 1.7570 1.8874 1.9385

data2:

2.1913 1.8981 2.2441 2.3068 2.1198 2.1484 1.8056 1.7747
2.0842 1.8750 2.3023 2.1204 1.8972 2.1534 1.8028 1.9401
2.2105 1.9618 2.2472 1.9656 2.3098 1.9771 1.9520 1.8627
2.2863 1.9959 2.1781 1.9544 1.9281 1.9286 1.9699 2.0330
2.1987 2.0583 2.0953 2.0206 2.1148 2.3789 1.7052 1.9145
2.0513 2.0850 1.9810 2.4943 1.9120 2.2209 1.9461 2.0882
2.0049 2.0416 1.9303 2.3681 1.8974 2.0054 1.9261 1.9097
1.6882 2.1196 1.8641 2.3600 2.0931 1.7641 2.1131 1.7748
1.8840 1.7604 1.7664 2.2000 2.0055 1.8229 1.9871 1.9168
1.7340 1.9656 1.8480 2.0523 1.9950 1.8716 1.9206 1.7786
1.9604 1.9804 1.9601 2.0599 1.8969 1.8087 2.1845 1.8598

data3:

1.8064 2.2016 2.4506 2.1828 2.1171 1.9308 2.1707 2.1885
2.2310 2.2400 1.9115 2.1527 2.0934 1.7989 2.2144 2.0091
1.9248 2.2038 1.9676 1.9224 1.9502 1.7990 2.0824 2.1300
2.0095 2.0341 1.8433 1.8361 1.9958 1.8243 2.0397 2.0482
2.1143 2.2627 1.7620 1.7561 1.9490 1.9803 1.9336 2.2511
2.2377 2.5414 1.7867 1.6618 2.5090 1.8325 2.0212 2.1616
2.3476 2.1878 2.0469 1.7508 2.2969 1.7939 2.0291 2.0721
2.3534 2.0932 2.3502 1.9960 2.0710 1.9923 1.7787 1.9772
2.2607 2.1504 2.3685 2.1148 2.1961 1.7738 1.8405 2.0135
2.2411 1.9916 2.4726 2.0347 2.0751 1.7570 1.8874 1.9385

data4:

2.1913 1.8981 2.2441 2.3068 2.1198 2.1484 1.8056 1.7747
2.0842 1.8750 2.3023 2.1204 1.8972 2.1534 1.8028 1.9401
2.2105 1.9618 2.2472 1.9656 2.3098 1.9771 1.9520 1.8627
2.2863 1.9959 2.1781 1.9544 1.9281 1.9286 1.9699 2.0330
2.1987 2.0583 2.0953 2.0206 2.1148 2.3789 1.7052 1.9145
2.0513 2.0850 1.9810 2.4943 1.9120 2.2209 1.9461 2.0882
2.0049 2.0416 1.9303 2.3681 1.8974 2.0054 1.9261 1.9097
1.6882 2.1196 1.8641 2.3600 2.0931 1.7641 2.1131 1.7748
1.8840 1.7604 1.7664 2.2000 2.0055 1.8229 1.9871 1.9168
1.7340 1.9656 1.8480 2.0523 1.9950 1.8716 1.9206 1.7786
1.9604 1.9804 1.9601 2.0599 1.8969 1.8087 2.1845 1.8598

I need to get column 1 from data1, column 1 from data2, column 1 from data3 and column 1 from data4 and combine them into a single data frame, side by side and do the same with the other columns.

I was using this method, but it is a little bit rudimentary

dat1 <- data.frame(data1$V1)
dat2 <- data.frame(data2$V1)
dat3 <- data.frame(data3$V1)
dat4 <- data.frame(data4$V1)

final_data1 <- cbind(dat1,dat2,dat3,dat4)
. 
.
.


dat1 <- data.frame(data1$V8)
dat2 <- data.frame(data2$V8)
dat3 <- data.frame(data3$V8)
dat4 <- data.frame(data4$V8)

final_data8 <- cbind(dat1,dat2,dat3,dat4)

Is there any way to do this with a loop?

score 2 · Accepted Answer · answered Aug 11 '16 at 09:15

2

We can loop through columns, bind them, and keep the resulting 8 dataframes in a list:

res <- lapply(1:8, function(i){ cbind(data1[i], data2[i], data3[i], data4[i]) })

answered Aug 11 '16 at 09:15

zx8754

52,746
12
114
209

Yes, that works, but how I can extract the 8 dataframe created from that list? I would use them to make histograms. – Enrique Aug 11 '16 at 09:22
1

@Enrique `res[[1]]` should give you the first data.frame. – zx8754 Aug 11 '16 at 09:37
Another question. If I want to join four data frame but each one has got different size. How I can join them with your method? I mean for example: dat1 with 100 objects, dat2 with 50 objects etc... – Enrique Aug 11 '16 at 14:47
@Enrique [see this post](http://stackoverflow.com/questions/19074163) for *cbind*ing unequal data.frames. – zx8754 Aug 11 '16 at 20:09

akrun · Answer 2 · 2016-08-11T09:37:12.423

We can place all the data in a list, extract the first column and cbind it together.

do.call(cbind, lapply(mget(paste0("data", 1:4)), `[`, 1))

It may be better to keep it in a single dataset with an id col to refer to which dataset it came from

library(data.table)
dt <- rbindlist(mget(paste0("data", 1:4)), idcol = TRUE)

Also, for plotting purpose, it may be better to keep in the 'long' format

dL <- melt(dt, id.var = ".id")

and use ggplot to plot

library(ggplot2)
ggplot(dL, aes(value, ..density.., colour = variable)) +
                       geom_freqpoly()

Or use geom_histogram with facet_wrap (for individual plots for each column)

ggplot(dL, aes(value)) +
        geom_histogram() +
        facet_wrap(~variable)

score -1 · Answer 3 · answered Aug 11 '16 at 09:10

You can use an eval(parse()) construction:

df1 = data.frame(V1 = 1:10)
df2 = data.frame(V1 = 1:10)
df3 = data.frame(V1 = 1:10)
df4 = data.frame(V1 = 1:10)

final = matrix(NA, nrow = nrow(df1), ncol = 4)

for (i in 1:4) {
  final[, i] = eval(parse(text = paste0('df', i, '$V1')))
}

Another way is to put all dfs in a list and use lapply:

dfList = list(df1, df2, df3, df4)
do.call(cbind, lapply(dfList, `[[`, 'V1'))

Above, the lapply loops over all dataframes and returns a list where each elemnt is the first column. the part do.call(cbind, ...) then binds all those elements together into one matrix.

Loop over multiple data frames

3 Answers3