3

I have the following question and I have not been able to find an answer that works I have multiple dataframes (35 to be exact) and I want to add another dataframe containing demographics to each one of the 35 dataframes.

To make it simple, I have the following example:

df1 <- data.frame(ID = c(1:3), b = c('x', 'y', 'z'), c = c('gh', 'fg', 'xv'), df = c('z', 'x', 'y'))

df2 <- data.frame(ID = c(1:3), v = c('a', 'mm', 'xc'), hg = c('yty', 'zc', 'cx'), fd = c('z', 'x', 'y'))

df3 <- data.frame(ID = c(1:3, t = c('ae', 'yw', 'zs'), j = c('ewr', 'zd', 'x'), sd = c('z', 'x', 'y'))

df4 <- data.frame(ID = c(1:3), u = c('df', 'y', 'z'), k = c('df', 'zs', 'xf'), f = c('z', 'x', 'y'))
.  
.  
.  
df(n) <- ...  


demo <- data.frame(sex = c('m', 'm', 'f'), age = c('30', '50', '62'), vital_sts = c('a', 'a', 'd'))

What I would like to do is to paste the demo dataframe to each one of the other frames. So I have tried:

dfList <- list(df1, df2,df3,df4...)  

for (i in 1:length(dfList) {  
     i <- merge(demo,i)  
}

However, when I check the dataframes they are not merged. Any help will be greatly appreciated. Thank you!

Phil
  • 7,287
  • 3
  • 36
  • 66
EduardoRod
  • 105
  • 1
  • 8
  • 1
    How did you get 35 *separate* data frames in your global environment? You should be using a [list of data frames](https://stackoverflow.com/a/24376207/1422451). Then iterate to `cbind` the demographics. – Parfait Feb 10 '22 at 01:17
  • In your code, you need to use `i` to index into `dfList` rather then as the object itself. So inside the loop it should read `dfList[[i]] <- merge(demo, dfList[[i]])` – Dan Adams Feb 10 '22 at 03:01
  • And that will only modify the copy of each df inside the list not the original one in the global environment. – Dan Adams Feb 10 '22 at 03:02
  • Oh and to use `merge` you need some column in common so you should add the `ID` column to `demo` – Dan Adams Feb 10 '22 at 03:16

2 Answers2

2

A possible solution, with previous creation of a list with all dataframes to be merged with demo:

df1 <- data.frame(ID = c(1:3), b = c('x', 'y', 'z'), c = c('gh', 'fg', 'xv'), df = c('z', 'x', 'y'))

df2 <- data.frame(ID = c(1:3), v = c('a', 'mm', 'xc'), hg = c('yty', 'zc', 'cx'), fd = c('z', 'x', 'y'))

demo <- data.frame(sex = c('m', 'm', 'f'), age = c('30', '50', '62'), vital_sts = c('a', 'a', 'd'))

dfs <- list(df1, df2)

l <- lapply(dfs, cbind, demo)
names(l) <-  c("df1", "df2")
list2env(l, .GlobalEnv)

df1

#>   ID b  c df sex age vital_sts
#> 1  1 x gh  z   m  30         a
#> 2  2 y fg  x   m  50         a
#> 3  3 z xv  y   f  62         d

df2

#>   ID  v  hg fd sex age vital_sts
#> 1  1  a yty  z   m  30         a
#> 2  2 mm  zc  x   m  50         a
#> 3  3 xc  cx  y   f  62         d
PaulS
  • 21,159
  • 2
  • 9
  • 26
  • Thank you for this answer. It is definitely helpful and it works if I just print the dataframes. However, it does not change the actual dataframe when I look at the columns after doing lapply – EduardoRod Feb 11 '22 at 02:26
  • I consider this a dangerous solution, since `cbind` does not guarantee that the "demo" observations are assigned in the correct order. – jay.sf Feb 11 '22 at 04:46
  • 1
    After my code, @EduardoRod, just run the two following lines of code: `names(l) <- c("df1", "df2")` and `list2env(l, .GlobalEnv)`. After that, you can access to the modified `df1` and `df2`. But before all, assign the result of `lapply` to `l`: `l <- lapply(dfs, cbind, demo)`. – PaulS Feb 11 '22 at 11:45
1

You should give your demo data frame definitely an "ID" column as well! Then you do not have to hope that the demographics are correctly assigned to the observations, especially if the script is still changing during the work process. That may easily be done using transform (I simply use the consecutive ID's 1:3 here in the example).

res <- lapply(list(df1, df2, df3, df4), merge, transform(demo, ID=1:3))
res
# [[1]]
#   ID b  c df sex age vital_sts
# 1  1 x gh  z   m  30         a
# 2  2 y fg  x   m  50         a
# 3  3 z xv  y   f  62         d
# 
# [[2]]
#   ID  v  hg fd sex age vital_sts
# 1  1  a yty  z   m  30         a
# 2  2 mm  zc  x   m  50         a
# 3  3 xc  cx  y   f  62         d
# 
# [[3]]
#   ID  t   j sd sex age vital_sts
# 1  1 ae ewr  z   m  30         a
# 2  2 yw  zd  x   m  50         a
# 3  3 zs   x  y   f  62         d
# 
# [[4]]
#   ID  u  k f sex age vital_sts
# 1  1 df df z   m  30         a
# 2  2  y zs x   m  50         a
# 3  3  z xf y   f  62         d

If you have gazillions of data frames in your workspace, as it looks like, you may list by pattern using mget(ls(pattern=)). (Or better yet, change your code to get them in a list in the first place.)

lapply(mget(ls(pat='^df\\d+')), merge, transform(demo, ID=1:3))

Edit

If I understand you correctly, according to your comment you have a large data frame DAT from which you want to assemble smaller data frames of variable groups and merge the demo to them. In this case I would put the variable names of these groups in a named list vgroups. Next, lapply over it to simultaneously subset dat with "ID" concatenated and merge it to demo.

demo still should have an "ID", because you don't want to trust, all rows are sorted in the same order, just consider for example sort(c(3, 10, 1, 100)) vs. sort(as.character(c(3, 10, 1, 100))) or omitted rows for whatever reason etc.

demo <- transform(demo, ID=1:3)  ## identify demo observations

vgroups <- list(g1=c("b", "c", "df"), g2=c("v", "hg", "fd"), g3=c("t", "j", "sd"),
               g4=c("u", "k", "f"))

res1 <- lapply(vgroups, \(x) merge(demo, DAT[, c('ID', x)], by="ID"))  
                          ## saying by ID is even more save --^
res1
# $g1
#   ID sex age vital_sts b  c df
# 1  1   m  30         a x gh  z
# 2  2   m  50         a y fg  x
# 3  3   f  62         d z xv  y
# 
# $g2
#   ID sex age vital_sts  v  hg fd
# 1  1   m  30         a  a yty  z
# 2  2   m  50         a mm  zc  x
# 3  3   f  62         d xc  cx  y
# 
# $g3
#   ID sex age vital_sts  t   j sd
# 1  1   m  30         a ae ewr  z
# 2  2   m  50         a yw  zd  x
# 3  3   f  62         d zs   x  y
# 
# $g4
#   ID sex age vital_sts  u  k f
# 1  1   m  30         a df df z
# 2  2   m  50         a  y zs x
# 3  3   f  62         d  z xf y

Access individual data frames:

res1$g1
#   ID sex age vital_sts b  c df
# 1  1   m  30         a x gh  z
# 2  2   m  50         a y fg  x
# 3  3   f  62         d z xv  y

If you still want the individual data frames in your environment, use list2env:

list2env(res1)
ls()
# [1] "DAT"     "demo"    "res1"    "vgroups"

Data:

DAT <- structure(list(ID = 1:3, b = c("x", "y", "z"), c = c("gh", "fg", 
"xv"), df = c("z", "x", "y"), f = c("z", "x", "y"), fd = c("z", 
"x", "y"), hg = c("yty", "zc", "cx"), j = c("ewr", "zd", "x"), 
    k = c("df", "zs", "xf"), sd = c("z", "x", "y"), t = c("ae", 
    "yw", "zs"), u = c("df", "y", "z"), v = c("a", "mm", "xc"
    ), x1 = c("gs", "gs", "gs"), x2 = c("cs", "cs", "cs"), x3 = c("tv", 
    "tv", "tv"), x4 = c("fb", "fb", "fb")), row.names = c(NA, 
-3L), class = "data.frame")

demo <- data.frame(sex = c('m', 'm', 'f'), age = c('30', '50', '62'), vital_sts = c('a', 'a', 'd'))
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • Hi. This is really helpful and honestly this is the first time that I work with these many dataframes so I am not sure if I am following. Do you mean putting the daframes on a list instead of an environment and then parse through that list? The actual issue of this dataframe is that is a giant one that has a ton of variables (>500) but there are 'groups' of variables that belong together. This is why they broke it down in many small dataframes. Wold you suggest me filtering from the 'main' dataframe instead of breaking it down and putting back together? – EduardoRod Feb 11 '22 at 02:31
  • @EduardoRod I try to keep the environment as clean as possible when working, that's why I would list the data frames. Please see my edited answer for a way to directly put groups into individual data frames. – jay.sf Feb 11 '22 at 04:47