2

I want to join several data.frames in one. All the data.frames share an identical column.

There are different ways to merge several datasets and since I am using this approach Reduce(function(...) merge(..., all=TRUE), list( )) I need to get a list of the data.frames that I have in the environment. However, every time that I try to get a list of them, the features of being a data.frame disappear and they are only saved as a names.

These are my dataframes:

file_1 <- women
file_2 <- women
colnames(file_2) <- c("height_2", "weight_2")
file_3 <- women
colnames(file_3) <- c("height_3", "weight_3")
file_4 <- women
colnames(file_4) <- c("height_4", "weight_4")
file_5 <- women
colnames(file_5) <- c("height_5", "weight_5")

Since I want to merge them, I need to add the same column to all of them. With the first line of code, I make a list of the variables that I have in the environment (I only want the data.frames which start with the name "file")

list_files <- grep("file",names(.GlobalEnv),value=TRUE)

for (file in list_files){
  temp <- get(file)
  # We add the column 
  temp$ID <- "col"
  #we return the change in the file
  assign(file, temp)
}
rm(temp) #we don't need it anymore.

However, when I try to use list_files (which has the name of the data.frames) in order to merge them, I don't get a proper data.frame merged.

DF_complete <- Reduce(function(...) merge(..., all=TRUE), list(list_files))

> class(DF_complete)
[1] "character"

On the other hand, when I try this code (I write all the dataframes myself), I get the dataframe that I want.

DF_2 <- Reduce(function(...) merge(..., all=TRUE), list(file_1, file_2, file_3, file_4, file_5))

class(DF2)
[1] "data.frame"

I want to avoid writing all the data.frames. Right now I have 5 data.frames, but when I have more than 10.... it will be tough. For that reason, I want to find another way.

I saw this post and I have tried this, but they are not saved as data.frames.

list_df <- list(list_files)
> list_df
[[1]]
[1] "file_1" "file_2" "file_3" "file_4" "file_5"
class(list_df)
[1] "list"

Does anyone know how to do it?

Thanks very much in advance

emr2
  • 1,436
  • 7
  • 23
  • Can you save all your dataframes into an object (list) instead of the global environment? – user2974951 Jan 14 '22 at 09:19
  • @user2974951 How do you suggest doing it? My original problem is that I load "RData" files, so I take them from a path, remove the pattern "RData" and then I load them into the environment... For that reason I was thinking about how to merge them from the environment... – emr2 Jan 14 '22 at 10:07
  • Do you create those RData files? Do you save the dataframes into the global environment there already? Can you save them into a list before saving the RData file? – user2974951 Jan 14 '22 at 10:11
  • @user2974951 First of all I load individual .tsv files, then I save each of them in .RData objects (just in case the original file get corrupted) and I worked with those RData files – emr2 Jan 14 '22 at 10:14

1 Answers1

3

If we have multiple data.frames in the global environment that we want to merge, we can use mget and ls:

file_1 = data.frame(id = c(1,2), a = c(1,2))
file_2 = data.frame(id = c(1,2), b = c(3,4))
file_3 = data.frame(id = c(3,4), a = c(5,6))

Reduce(\(...) merge(..., all = T), mget(ls(pattern = "file")))
  id a  b
1  1 1  3
2  2 2  4
3  3 5 NA
4  4 6 NA
Donald Seinen
  • 4,179
  • 5
  • 15
  • 40
  • @Eva the example I used is to show how `merge` works - it fills `NA` where no value is found when merging. When the `id` column in `file_3` is made into `1:2`, the `NA` would just shift to another position in the resulting data.frame. To see how different joining operations work, I suggest running the following code: `purrr::reduce(mget(ls(pattern = "file")), dplyr::full_join)`, refer to `?dplyr::full_join`, and try replacing `full_*` with `inner, right, left, anti`. – Donald Seinen Jan 14 '22 at 10:19
  • Thanks very much for your help! I didn't know about `mget` and `ls`! – emr2 Jan 14 '22 at 10:19
  • And sorry for the question that I made before, I deleted it because I finally understood later. Sorry for the mess. And thanks very much again for your answer! – emr2 Jan 14 '22 at 10:20