0

I have 2 csv datasets, each one with about 10k columns. The datasets are extracted from the same source, but the column sequence of these datasets is different (there are some new columns on the 2nd ds). So, I want to merge data of the 2nd dataset into the first one, keeping the column sequence of the first dataset. How can I do this?

Here follows an example:

Dataset 1:

Brand   Year    Model   Price
Ford    2010    Taurus  5K
Toyota  2015    Yaris   4K

Dataset 2:

Brand       Year    Model   Color       Location    Price
Chevrolet   2013    Spark   Dark Gray   PHI         2K

I would like to ignore the new columns (color, location) on the 2nd dataset and add the data with the same columns (brand, year, model, price) of the 2nd dataset into the first one.

Thanks in advance.

1 Answers1

1

If you want to append the two datasets, try using bind_rows from the dplyr library. Use the first dataset as the first argument.

Here's a reproducible example you can modify if this result doesn't get you what you are looking for. Remember, a reproducible example means that you provide code that others can run when they are testing solutions for you. Your example doesn't allow users to copy data into R and test a solution currently. Try using dput on a small dataset to get some data for folks on stack overflow to use.

library(dplyr)

# Make up data
df <- data.frame(a = c(1, 2), b = c(3, 4))
df2 <- data.frame(a = c(5,6), b = c(2, 3), c = c(7, 8), d = c(1, 5))

# determine columns to remove from df2: 
remove.these <- setdiff(colnames(df2), colnames(df))
# remove them before binding to save time
df2 <- select(df2, -remove.these)

# bind two dataframes together
finaldf <- bind_rows(df, df2) 
Nova
  • 5,423
  • 2
  • 42
  • 62