0

If I have 5 data frames in the global environment, such as a,b,c,d,and e

I want the data frame a to be compared with e, and if R finds any common elements in a and e, delete the elements in a. then I want the data frame b to be compared with e and delete the common elements, and so on.

Actually I have 20 tables need to be compared with e.

Can anyone give some elegant way to handle this problem? I'm thinking of loop or functions but can't work the details out.

Thanks everybody and have a nice day!

www
  • 38,575
  • 12
  • 48
  • 84
  • 2
    Can you provide us with a small subset of the data, maybe a subset of a and b, and then one of e to compare with? You can use `dput` to share the abbreviated data structures once you have created them. A reproducible example will make it much more likely that you get an answer here. Thanks :) – mysteRious Jun 21 '18 at 11:24
  • ok i'll try. first time to use stack. thanks for reminding – user9972698 Jun 21 '18 at 11:25
  • hi, actually i don't really know how to share the data. but i can provide the structure here. all the dataframe have only one column and the column names are the same. and all the elements are numbers with the same length like 123456 – user9972698 Jun 21 '18 at 11:27
  • 4
    This link will show you how to share the data: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – mysteRious Jun 21 '18 at 11:28

1 Answers1

1

The easiest would be to put all the dataframes you want to compare in a list, then use lapply to loop over this list:

# create list of data.frames
dlist <- list(df1 = data.frame(var1 = 1:10), df2 = data.frame(var1 = 11:20),
              df3 = data.frame(var1 = 21:30), df4 = data.frame(var1 = 31:40))

# create master-data.frame
set.seed(1)
df <- data.frame(var1 = sample(1:100, 30))

# use lapply() to loop over the data and exclude all elements that are in the master-data.frame
dlist <- lapply(dlist, function(x){
  x <- x[!x$var1 %in% df$var1, , drop = FALSE]
})

Result:

> dlist
$df1
  var1
2    2
3    3
4    4
5    5
7    7
8    8
9    9

$df2
  var1
1   11
2   12
3   13
4   14
5   15
8   18

$df3
   var1
2    22
3    23
4    24
6    26
10   30

$df4
   var1
1    31
3    33
5    35
6    36
8    38
9    39
10   40

If you absolutely need the dataframes in your global directory, you could use list2env:

list2env(dlist, envir = .GlobalEnv)
LAP
  • 6,605
  • 2
  • 15
  • 28
  • Oh my god, it seems work. i'll try it. thanks so much – user9972698 Jun 21 '18 at 11:42
  • You are welcome. If the answer satisfies all your needs regarding your question, feel free to check the 'accept' button. – LAP Jun 21 '18 at 12:04
  • hey, i try on my datasets and it doesn't remove the common elements. i'm afraid i didn't make myself clear. all the dataframes have different amount of elements. in your example, you make four datasets with length all equal to 10. i don't know if it matters. and in the last step, list2env, there is an error is like: names(x) must be a vector with the same length of x.(i'm using Chinese version of R so i translate the error. ) – user9972698 Jun 21 '18 at 12:08
  • i think the problem could be that all your defined dataframes have names. but i usea = list.files() ;dir_a = paste(path_new,a,sep="/") ;n_a = length(dir_a) read.file <- function(File){ read_csv(File,col_names = FALSE) } datalist <- lapply(a,read.file) to read files. so in the list, the dataframes actually don't have names, for which i don't know why – user9972698 Jun 21 '18 at 12:27
  • i think the problem is the name. i have to define the name in the list? – user9972698 Jun 21 '18 at 12:35
  • i use names(x) <- c()to name the list, and it works out. – user9972698 Jun 21 '18 at 12:42
  • Yeah, for `list2env` the list elements need to be named. The length of the elements in the dataframes should not matter at all. Did the removal of the common elements work, or did it not? – LAP Jun 21 '18 at 13:22
  • after i add the names into the list, everything works perfectly – user9972698 Jun 21 '18 at 13:23
  • Great! To answer the other question, the 'accept' checkmark is right below the upvote/downvote arrows and allows you to accept an answer for your question, therefore indicating to other users that your issue has been solved. – LAP Jun 21 '18 at 13:32
  • hey, i have accepted your answer. last question, hope you don't mind:) i use make.names(gsub("*.txt$", "", a)) to show the names of the dfs. a is the list.files() in my working directory. and the names(datalist)<-c() and i copy the names and add the comma by hand. it's not really a tedious work. but when i have like 25 dfs, i need to add 24 comma between the names. it's not very intelligent lol. do you have any functions to do this job? thank you. – user9972698 Jun 22 '18 at 02:31
  • Is it a string? If so, just use `paste(x, collapse = ",")` with `x` being the element where all the parts are in. – LAP Jun 24 '18 at 16:19