I am trying to join dataframe bar
to foo
based on a common column and then keep the original column order in foo
:
> head(foo)
city course1
1 Aalborg JEMES
2 Aarhus EM-SANF
3 Aix-en-Provence EMLE
4 Almaty IMRCEES
5 Alnarp SUFONAMA
6 Amsterdam ATOSIM
> colnames(foo)
[1] "city" "course1"
> head(bar)
code website
1 4CITIES http://www.4cities.eu/
2 ACES http://www.sams.ac.uk/aces-erasmus
3 ADVANCES http://www.socialworkadvances.org/
4 AMASE http://www.amase-master.net/
5 ARCHMAT http://www.erasmusmundus-archmat.uevora.pt/
6 ASC
http://www.master-asc.org/
> colnames(bar)
[1] "code" "website"
The join column is course
in foo
and code
in bar
. I have used the following formula:
test <- merge(x = foo, y = bar, by.x = "course1", by.y = "code", all.x=TRUE)[, union(names(foo), names(bar))]
This fails and produces the following error message:
Error in `[.data.frame`(merge(x = foo, y = bar, by.x = "course1", by.y = "code", :
undefined columns selected
I have found this solution here but it is not working, even though none of the column names are duplicated. What could be the problem?
A simple join works (without reordering) but brings the join column to the front:
> head(test)
course1 city website
1 JEMES Aalborg http://www.jemes-cisu.eu/
2 JEMES Aveiro http://www.jemes-cisu.eu/
3 JEMES Hamburg http://www.jemes-cisu.eu/
4 EM-SANF Aarhus http://www.emsanf.eu/UK/
5 EM-SANF Wageningen http://www.emsanf.eu/UK/
6 EM-SANF Debrecen http://www.emsanf.eu/UK/
I have tried adding sort = F
and removing all.x = TRUE
, but this does not work. The problem is that my actual dataframes have many more columns and will be going through multiple joins so I would like to preserve column order in all in one function. Is there a known effective workaround or a package that preserves column order in joins?
> names(test)
[1] "course1" "city" "website"
> names(foo)
[1] "city" "course1"
> names(bar)
[1] "code" "website"