2

I have position index vector in data.frame objects, but in each data.frame object, the order of position index vector are very different. However, I want to integrate/ merge these data.frame object object in one common data.frame with very specific order and not allow to have duplication in it. Does anyone know any trick for doing this more easily? Can anyone propose possible approach how to accomplish this task?

data

v1 <- data.frame(
  foo=c(1,2,3),
  bar=c(1,2,2),
  bleh=c(1,3,0))

v2 <-  data.frame(
  bar=c(1,2,3),
  foo=c(1,2,0),
  bleh=c(3,3,4))

v3 <-  data.frame(
  bleh=c(1,2,3,4),
  foo=c(1,1,2,0),
  bar=c(0,1,2,3))

initial output after integrating them:

initial_output <- data.frame(
  foo=c(1,2,3,1,2,0,1,1,2,0),
  bar=c(1,2,2,1,2,3,0,1,2,3),
  bleh=c(1,3,0,3,3,4,1,2,3,4)
)

remove duplication

rmDuplicate_output <- data.frame(
  foo=c(1,2,3,1,0,1,1),
  bar=c(1,2,2,1,3,0,1),
  bleh=c(1,3,0,3,4,1,2)
)

final desired output:

final_output <- data.frame(
  foo=c(1,1,1,1,2,3,0),
  bar=c(0,1,1,1,2,2,3),
  bleh=c(1,1,2,3,3,0,4)
)

How can I get my final desired output easily? Is there any efficient way for doing this sort of manipulation for data.frame object? Thanks

jyson
  • 245
  • 1
  • 8
  • 27

3 Answers3

4

You could also use use mget/ls combo in order to get your data frames programmatically (without typing individual names) and then use data.tables rbindlist and unique functions/method for great efficiency gain (see here and here)

library(data.table)
unique(rbindlist(mget(ls(pattern = "v\\d+")), use.names = TRUE))
#    foo bar bleh
# 1:   1   1    1
# 2:   2   2    3
# 3:   3   2    0
# 4:   1   1    3
# 5:   0   3    4
# 6:   1   0    1
# 7:   1   1    2

As a side note, it usually better to keep multiple data.frames in a single list so you could have better control over them

Community
  • 1
  • 1
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
3

Here is a solution:

# combine dataframes
df = rbind(v1, v2, v3)

# remove duplicated
df = df[! duplicated(df),]

# sort by 'bar' column
df[order(df$bar),]
    foo bar bleh
7   1   0    1
1   1   1    1
4   1   1    3
8   1   1    2
2   2   2    3
3   3   2    0
6   0   3    4
user1981275
  • 13,002
  • 8
  • 72
  • 101
3

We can use bind_rows from dplyr, remove the duplicates with distinct and arrange by 'bar'

library(dplyr)
bind_rows(v1, v2, v3) %>%
             distinct %>%
             arrange(bar)
#    foo bar bleh
#1   1   0    1
#2   1   1    1
#3   1   1    3
#4   1   1    2
#5   2   2    3
#6   3   2    0
#7   0   3    4
akrun
  • 874,273
  • 37
  • 540
  • 662