-1

I want to bind rows. However, few columns of the data.frames have different attributes. Like df1$caseid and df1$v001 have different attributes than df2$caseid and df2$v001. Wondering how can I can bind there data.frames.

library(tidyverse)
library(tidytable)
#> 
#> Attaching package: 'tidytable'
#> The following object is masked from 'package:stats':
#> 
#>     dt

df1 <- 
  structure(list(caseid = structure(c("   11 1  1 1  2", "   11 1  1 1  2", 
"   11 1  1 1  2", "   11 1  1 1  2", "   11 1  1 1  2", "   11 1  1 2  2"
), label = "case identification", class = c("labelled", "character"
), format = "%15s"), bidx = structure(c(1L, 2L, 3L, 4L, 5L, 1L
), label = "birth column number", class = c("labelled", "integer"
), format = "%8.0g"), v000 = structure(c("PK2", "PK2", "PK2", 
"PK2", "PK2", "PK2"), label = "country code and phase", class = c("labelled", 
"character"), format = "%3s"), v001 = structure(c(1101001L, 1101001L, 
1101001L, 1101001L, 1101001L, 1101001L), label = "cluster number", class = c("labelled", 
"integer"), format = "%12.0g"), v002 = structure(c(1L, 1L, 1L, 
1L, 1L, 2L), label = "household number", class = c("labelled", 
"integer"), format = "%8.0g")), row.names = c(NA, -6L), class = "data.frame")

df2 <- 
  structure(list(caseid = structure(c(1L, 1L, 1L, 1L, 1L, 2L), .Label = c("       1   1  2", 
"       1   4  1"), class = "factor"), bidx = structure(c(1L, 
2L, 3L, 4L, 5L, 1L), label = c(BIDX = "Birth column number"), class = c("labelled", 
"numeric")), v000 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "PK7", class = "factor"), 
    v001 = structure(c(1L, 1L, 1L, 1L, 1L, 1L), label = c(V001 = "Cluster number"), class = c("labelled", 
    "numeric")), v002 = structure(c(1L, 1L, 1L, 1L, 1L, 4L), label = c(V002 = "Household number"), class = c("labelled", 
    "numeric"))), row.names = c(NA, -6L), class = "data.frame")

rbind(df1, df2)
#>             caseid bidx v000    v001 v002
#> 1     11 1  1 1  2    1  PK2 1101001    1
#> 2     11 1  1 1  2    2  PK2 1101001    1
#> 3     11 1  1 1  2    3  PK2 1101001    1
#> 4     11 1  1 1  2    4  PK2 1101001    1
#> 5     11 1  1 1  2    5  PK2 1101001    1
#> 6     11 1  1 2  2    1  PK2 1101001    2
#> 7         1   1  2    1  PK7       1    1
#> 8         1   1  2    2  PK7       1    1
#> 9         1   1  2    3  PK7       1    1
#> 10        1   1  2    4  PK7       1    1
#> 11        1   1  2    5  PK7       1    1
#> 12        1   4  1    1  PK7       1    4

bind_rows(df1, df2)
#> Error: Can't combine `..1$caseid` <labelled> and `..2$caseid` <factor<da793>>.

bind_rows.(df1, df2)
#> Error in rbindlist(dots, idcol = .id, use.names = .use_names, fill = .fill): Class attribute on column 2 of item 2 does not match with column 2 of item 1.
MYaseen208
  • 22,666
  • 37
  • 165
  • 309
  • `rbind` does work for you, right? What is the question? – Ronak Shah Aug 07 '20 at 11:56
  • @RonakShah yes `rbind` works with these small `data.frames` but not for large data sets. Also `bind_rows` from `tidyverse` and `bind_rows.` from `tidytable` do not work even for these small `data.frames`. So looking for an efficient approach. – MYaseen208 Aug 07 '20 at 12:06
  • What happens with `rbind` ? Does it give an error (what is it?) or is very slow? `bind_rows` would not work when classes are different. It will always give an error. – Ronak Shah Aug 07 '20 at 12:07
  • @RonakShah: With my actual data sets `rbind` throws the following error message `Error in rbindlist(l, use.names, fill, idcol) : Class attribute on column 2 of item 2 does not match with column 2 of item 1`. – MYaseen208 Aug 07 '20 at 12:11
  • Try with `base::rbind` – Ronak Shah Aug 07 '20 at 12:12
  • `base::rbind` throws the same error. – MYaseen208 Aug 07 '20 at 12:15

1 Answers1

0

It sounds like you need to fix the column classes to match no matter what. If the numeric columns are always integers, change the classes in df2 to be integer.

i <- sapply(df2, is.numeric)
df2[i] <- lapply(df2[i], as.integer)

Convert factors to character vectors. If both are factors, but with different levels, bind_rows will still fail.

i <- sapply(df2, is.factor)
df2[i] <- lapply(df2[i], as.character)

If you need these columns to be factors, refactor them after you bind rows.

Ben Norris
  • 5,639
  • 2
  • 6
  • 15