16

I'd like to stack a list of data.frames, but sometimes the columns have different data types. I'd like the operation to coerce to the lowest common denominator (which is usually character in my case).

This stacking occurs inside a package function that accepts almost any list of data.frames. It doesn't realistically have the ability to coerce ds_a$x to a character before bind_rows().

ds_a <- data.frame(
  x = 1:6,
  stringsAsFactors = FALSE
)
ds_b <- data.frame(
  x = c("z1", "z2"),
  stringsAsFactors = FALSE
)

# These four implementations throw:
# Error: Can not automatically convert from integer to character in column "x".
ds_1 <- dplyr::bind_rows(ds_a, ds_b)
ds_2 <- dplyr::bind_rows(ds_b, ds_a)
ds_3 <- dplyr::bind_rows(list(ds_a, ds_b))
ds_4 <- dplyr::union_all(ds_a, ds_b)

I'd like the output to be a data.frame with a single character vector:

   x
1  1
2  2
3  3
4  4
5  5
6  6
7 z1
8 z2

I have some long-term plans to use meta-data from the (REDCap) database to influence the coercion, but I'm hoping there's a short-term general solution for the stacking operation.

wibeasley
  • 5,000
  • 3
  • 34
  • 62
  • It works with `rbind` as well. I presume you are wanting to bind together the batched API calls, so they should all have the same names. – Benjamin Sep 07 '16 at 19:05
  • I usually convert them to factors in this situation but I'm not sure to what extent it will affect the speed if you convert every column to factor column then convert them back.. – Hao Sep 07 '16 at 19:21
  • 1
    fwiw, I do something like `ds_5<-bind_rows(ds_a%>%mutate_all(as.character),ds_b)` in cases where ds_a would be full of integers and ds_b is character. – Pake Jun 14 '21 at 20:47

2 Answers2

15

We can use rbindlist from data.table

library(data.table)
rbindlist(list(ds_a, ds_b))
#    x
#1:  1
#2:  2
#3:  3
#4:  4
#5:  5
#6:  6
#7: z1
#8: z2
akrun
  • 874,273
  • 37
  • 540
  • 662
  • So this requires turning it into a `data.table` - is there any loss in this transformation from (and possibly back to) a `tibble`? – Scransom May 03 '17 at 05:45
  • @geryan Not to my knowledge. When you are converting to `tibble`, some attributes are added and some other attributes are removed – akrun May 03 '17 at 05:57
3

Recently I switched to an approach that keeps all columns as strings initially(when converting from plain-text to a data.frame), then stacks, and finally converts the columns to an appropriate data type after it has all the rows to make a decision (using readr::type_convert()).

It mimics this example. I haven't done any performance comparisons but there wasn't a noticeable difference (the internet is the real bottleneck). Also, I kinda like the idea of reducing the number of data type conversions.

library(magrittr)
col_types <- readr::cols(.default = readr::col_character())
raw_a <- "x,y\n1,21\n2,22\n3,23\n4,24\n5,25\n6,26"
raw_b <- "x,y\nz1,31\nz2,32"
ds_a <- readr::read_csv(raw_a, col_types=col_types)
ds_b <- readr::read_csv(raw_b, col_types=col_types)

list(ds_a, ds_b) %>% 
  dplyr::bind_rows() %>% 
  readr::type_convert()
#> Parsed with column specification:
#> cols(
#>   x = col_character(),
#>   y = col_double()
#> )
#> # A tibble: 8 x 2
#>   x         y
#>   <chr> <dbl>
#> 1 1        21
#> 2 2        22
#> 3 3        23
#> 4 4        24
#> 5 5        25
#> 6 6        26
#> 7 z1       31
#> 8 z2       32

Created on 2019-12-03 by the reprex package (v0.3.0)

wibeasley
  • 5,000
  • 3
  • 34
  • 62
  • This did not work for me, granted versions could have changed over time – obewanjacobi Apr 09 '21 at 19:31
  • @obewanjacobi, what's the error message? Today it worked for me on two OSes with updated packages. Did you paste it exactly? – wibeasley Apr 16 '21 at 23:39
  • Ran it again today and worked this time, though I will say I tried this for 2 different sets of data and it did not work. Running your code above does give some strange warnings though: `Warning message: ... is not empty. We detected these problematic arguments: * needs_dots These dots only exist to allow future extensions and should be empty. Did you misspecify an argument?` – obewanjacobi Apr 20 '21 at 13:53