How to integrate set of vector in multiple data.frame into one without duplication?

Question

I have position index vector in data.frame objects, but in each data.frame object, the order of position index vector are very different. However, I want to integrate/ merge these data.frame object object in one common data.frame with very specific order and not allow to have duplication in it. Does anyone know any trick for doing this more easily? Can anyone propose possible approach how to accomplish this task?

data

v1 <- data.frame(
  foo=c(1,2,3),
  bar=c(1,2,2),
  bleh=c(1,3,0))

v2 <-  data.frame(
  bar=c(1,2,3),
  foo=c(1,2,0),
  bleh=c(3,3,4))

v3 <-  data.frame(
  bleh=c(1,2,3,4),
  foo=c(1,1,2,0),
  bar=c(0,1,2,3))

initial output after integrating them:

initial_output <- data.frame(
  foo=c(1,2,3,1,2,0,1,1,2,0),
  bar=c(1,2,2,1,2,3,0,1,2,3),
  bleh=c(1,3,0,3,3,4,1,2,3,4)
)

remove duplication

rmDuplicate_output <- data.frame(
  foo=c(1,2,3,1,0,1,1),
  bar=c(1,2,2,1,3,0,1),
  bleh=c(1,3,0,3,4,1,2)
)

final desired output:

final_output <- data.frame(
  foo=c(1,1,1,1,2,3,0),
  bar=c(0,1,1,1,2,2,3),
  bleh=c(1,1,2,3,3,0,4)
)

How can I get my final desired output easily? Is there any efficient way for doing this sort of manipulation for data.frame object? Thanks

Also, `library(data.table) ; unique(rbindlist(mget(ls()), use.names = TRUE))` — David Arenburg, Aug 12 '16 at 10:57

score 4 · Answer 1 · edited May 23 '17 at 11:48

You could also use use mget/ls combo in order to get your data frames programmatically (without typing individual names) and then use data.tables rbindlist and unique functions/method for great efficiency gain (see here and here)

library(data.table)
unique(rbindlist(mget(ls(pattern = "v\\d+")), use.names = TRUE))
#    foo bar bleh
# 1:   1   1    1
# 2:   2   2    3
# 3:   3   2    0
# 4:   1   1    3
# 5:   0   3    4
# 6:   1   0    1
# 7:   1   1    2

As a side note, it usually better to keep multiple data.frames in a single list so you could have better control over them

user1981275 · Answer 2 · 2016-08-19T18:04:07.420

3

Here is a solution:

# combine dataframes
df = rbind(v1, v2, v3)

# remove duplicated
df = df[! duplicated(df),]

# sort by 'bar' column
df[order(df$bar),]
    foo bar bleh
7   1   0    1
1   1   1    1
4   1   1    3
8   1   1    2
2   2   2    3
3   3   2    0
6   0   3    4

edited Aug 19 '16 at 18:04

answered Aug 12 '16 at 10:52

user1981275

13,002
8
72
101

score 3 · Accepted Answer · answered Aug 12 '16 at 10:52

We can use bind_rows from dplyr, remove the duplicates with distinct and arrange by 'bar'

library(dplyr)
bind_rows(v1, v2, v3) %>%
             distinct %>%
             arrange(bar)
#    foo bar bleh
#1   1   0    1
#2   1   1    1
#3   1   1    3
#4   1   1    2
#5   2   2    3
#6   3   2    0
#7   0   3    4

How to integrate set of vector in multiple data.frame into one without duplication?

data

initial output after integrating them:

remove duplication

final desired output:

3 Answers3