14

I have ten datasets that have been read from Excel files, using the xlsx library, and stored in tibbles. I want to merge them.

Here are example datasets. The number of variables differ between datasets, and some variables are only in one dataset. The value of the person variable will never overlap.

data1 <- tibble(person = c("A","B","C"),
    test1 = as.factor(c(1,4,5)), 
    test2 = c(14,25,10),
    test3 = c(12.5,16.0,4),
    test4 = c(16,23,21),
    test5 = as.factor(c(49,36,52)))

data2 <- tibble(person = c("D","E","F"),
    test1 = c(8,7,2), 
    test3 = c(6.5,12.0,19.5),
    test4 = as.factor(c(15,21,29)),
    test5 = as.factor(c(54,51,36)),
    test6 = c(32,32,29),
    test7 = c(13,11,10))

The actual datasets usually have ~50 rows and ~200 variables in them. I have tried

    all_data <- dplyr::bind_rows(data1,data2)

hoping to get this outcome

# A tibble: 6 x 8
  person test1 test2 test3 test4 test5 test6 test7
   <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1      A     1    14  12.5    16    49    NA    NA
2      B     4    25  16.0    23    36    NA    NA
3      C     5    10   4.0    21    52    NA    NA
4      D     8    NA   6.5    15    54    32    13
5      E     7    NA  12.0    21    51    32    11
6      F     2    NA  19.5    29    36    29    10

but instead I get this error

Error in bind_rows_(x, .id) : Column `test1` can't be converted from factor to numeric

I have searched Stackoverflow, and I found questions regarding this, and most answers center on trying to convert the variables to another class. But I don't care which classes my variables have, because I will just write the merged dataset to a CSV-file or Excel file.

Isn't there some kind of simple workaround?

asterdroid
  • 155
  • 1
  • 2
  • 8
  • 4
    For this situation, the `rbindlist` seems to be working fine i.e `library(data.table);list(data1, data2) %>% rbindlist(., fill = TRUE)` – akrun Oct 17 '17 at 11:33

3 Answers3

14

I think that this should work:

library(plyr)
all_data <- rbind.fill(data1,data2)
raquela
  • 268
  • 2
  • 8
  • 3
    Unfortunately, some SO users like to downvote answers without explaining why. In my experience, rbind.fill sometimes gives unexpected results (unexpected for me, a single wrong number pops up in the data frame). More experienced R users may explain why. – cibr Mar 29 '19 at 18:41
9

As the file are usually small (several hundred rows) and you simply want to combine the two file and write to a new file, I think we can convert all columns to character, thus the common columns in data1 and data2 will have the same type.

library(dplyr)
bind_rows(mutate_all(data1, as.character), mutate_all(data2, as.character))
mt1022
  • 16,834
  • 5
  • 48
  • 71
  • What if get the same error but in my case I have a list of tibbles (around 10 tibbles that altogether have more than 2k different columns)? Some tibbles don't include the problematic column. – mihagazvoda Oct 06 '19 at 07:57
  • @Miha, I don't know exactly how your data looks like. does `bind_rows(lapply(tibble_list, function(dtt){mutate_all(dtt, as.character)}))` work? – mt1022 Oct 07 '19 at 02:18
0

test1 in data1 is of class factor whereas in data2 is of class numeric. Combining a factor class and numeric class causes this problem. Solution either convert test1 in both data1 and data2 to factors and then use all_data <- dplyr::bind_rows(data1,data2)

or

data.table::rbindlist(data1,data2)

Gucci148
  • 1,977
  • 1
  • 13
  • 4