Merging two data.frames by two columns each

Question

I have a huge data.frame that I want to reorder. The idea was to split it in half (as the first half contains different information than the second half) and create a third data frame which would be the combination of the two. As I always need the first two columns of the first data frame followed by the first two columns of the second data frame, I need help.

new1<-all_cont_video_algo[,1:826]
new2<-all_cont_video_algo[,827:length(all_cont_video_algo)]
df3<-data.frame()

The new data frame should look like the following:

new3[new1[1],new1[2],new2[1],new2[2],new1[3],new1[4],new2[3],new2[4],new1[5],new1[6],new2[5],new2[6], etc.].

Pseudoalgorithmically, cbind 2 columns from data frame new1 then cbind 2 columns from data frame new2 etc.

I tried the following now (thanks to Akrun):

new1<-all_cont_video_algo[,1:826]
new2<-all_cont_video_algo[,827:length(all_cont_video_algo)]

new1<-as.data.frame(new1, stringsAsFactors =FALSE)
new2<-as.data.frame(new2, stringsAsFactors =FALSE)

df3<-data.frame()
f1 <- function(Ncol, n) {
as.integer(gl(Ncol, n, Ncol))
}  
lst1 <- split.default(new1, f1(ncol(new1), 2))
lst2 <- split.default(new2, f1(ncol(new2), 2))

lst3 <- Map(function(x, y) df3[unlist(cbind(x, y))], lst1, lst2)

However, giving me a "undefined columns selected error".

When you create a dataset `df1<-data.frame(c(1,2,3,4,5,6,7,8))` it is a single column dataset and not multiple columns. Please check the output — akrun, Jun 16 '19 at 16:39
The example might have been misleading. df1 and df2 were just added for the understanding of how the columns of the two data frames should be represented in df3. Sorry that it did confuse more than it did help. — Claudio, Jun 16 '19 at 16:45
If that is the case, have you tried my solution. It should work, assuming that the character columns are not `factor` — akrun, Jun 16 '19 at 16:47
it works. now i have a long single column vector and now more column names ;) — Claudio, Jun 16 '19 at 16:52
Do you need multiple datasets? In that case don't do `unlist`, use `Map(function(x, y) df3[c(cbind(x, y))], lst1, lst2)` — akrun, Jun 16 '19 at 16:57
Sorry, no idea how to describe it better or giving a reproducible example with a data.frame consisting of a 22x1652 dimension :( — Claudio, Jun 16 '19 at 16:58
You don't need to show that dimensions. I meant `;df1 <- as.data.frame(matrix(letters[1:10], 2, 5), stringsAsFactors = FALSE); df2 <- as.data.frame(matrix(1:10, 2, 5))` — akrun, Jun 16 '19 at 17:00
I modified the code in the solution. Please check if that is what you wanted — akrun, Jun 16 '19 at 17:04
unfortunately returns "undefined columns selected" on lst3. Am I allowed to paste my source in the comments? — Claudio, Jun 16 '19 at 17:21

akrun · Answer 1 · 2019-06-16T17:03:10.660

It is not clear without a reproducible example. Based on the description, we can split the dataset columns into a list of datasets and use Map to cbind the columns of corresponding datasets, unlist and use that to order the third dataset

1) Create a function to return a grouping column for splitting the dataset

f1 <- function(Ncol, n) {
 as.integer(gl(Ncol, n, Ncol))
  }

2) split the datasets into a list

lst1 <- split.default(df1, f1(ncol(df1), 2))
lst2 <- split.default(df2, f1(ncol(df2), 2))

3) Map through the corresponding list elements, cbind and unlist and use that to subset the columns of 'df3'

lst3 <- Map(function(x, y) df3[unlist(cbind(x, y))], lst1, lst2)

data

df1 <- as.data.frame(matrix(letters[1:10], 2, 5), stringsAsFactors = FALSE)
df2 <- as.data.frame(matrix(1:10, 2, 5))

Thank you for your suggestion. I try to add a reproducable example. — Claudio, Jun 16 '19 at 16:26

score 0 · Accepted Answer · answered Jun 16 '19 at 18:12

0

See whether the below code helps

library(tidyverse)

# Two sample data frames of equal number of columns and rows
df1 = mtcars %>% select(-1)
df2 = diamonds %>% slice(1:32) 

# get the column names
dn1 = names(df1)
dn2 = names(df2)

# create new ordered list
neworder = map(seq(1,length(dn1),2), # sequence with interval 2
               ~c(dn1[.x:(.x+1)], dn2[.x:(.x+1)])) %>% # a vector of two columns each
  unlist %>% # flatten the list
  na.omit # remove NAs arising from odd number of columns

# Get the data frame ordered
df3 = bind_cols(df1, df2) %>% 
  select(neworder)

answered Jun 16 '19 at 18:12

Theo

575
3
8

Thank you so much for helping. Unfortunately I am getting a "Error in select(., neworder) : unused argument (neworder)" error :( – Claudio Jun 16 '19 at 18:29
Checked it again and no issues. What is the output of neworder? The below should be your neworder. ` "cyl" "disp" "carat" "cut" "hp" "drat" "color" "clarity" "wt" "qsec" "depth" "table" "vs" "am" "price" "x" "gear" "carb" "y" "z" ` – Theo Jun 16 '19 at 18:40
The output of neworder are the column names. chr [1:1652]. The order would be correct though ;) – Claudio Jun 16 '19 at 18:43
Looks like there is a collision of select with someother package you have loaded. Use dplyr::select(neworder). https://stackoverflow.com/questions/24202120/dplyrselect-function-clashes-with-massselect. – Theo Jun 16 '19 at 18:51
%>% dplyr::select(neworder).... Theo... you were just 1 min faster.. I found that one out by myself. Now I am carefully checking my data. Thank you very much. I struggled more than 3 hours trying for loops etc. Can I send over some Swiss chocolate? ;) – Claudio Jun 16 '19 at 18:52
Wow that's sweet. If this worked, mark this answer as correct. That should do. – Theo Jun 16 '19 at 18:57

Merging two data.frames by two columns each

2 Answers2

data