3

This may be a bad question because I am not posting any reproducible example. My main goal is to identify columns that are of different types between two dataframe that have the same column names.

For example

df1

 Id      Col1      Col2     Col3
 Numeric Factor    Integer  Date

df2

 Id      Col1      Col2     Col3
 Numeric Numeric    Integer  Date

Here both the dataframes (df1, df2) have same column names but the Col1 type is different and I am interested in identifying such columns. Expected output.

Col1  Factor    Numeric

Any suggestions or tips on achieving this ?. Thanks

www
  • 38,575
  • 12
  • 48
  • 84
Science11
  • 788
  • 1
  • 8
  • 24

5 Answers5

6

Try compare_df_cols() from the janitor package:

library(janitor)
mtcars2 <- mtcars
mtcars2$cyl <- as.character(mtcars2$cyl)
compare_df_cols(mtcars, mtcars2, return = "mismatch")

#>   column_name  mtcars   mtcars2
#> 1         cyl numeric character

Self-promotion alert, I authored this package - am posting this function because it exists to solve precisely this problem.

Sam Firke
  • 21,571
  • 9
  • 87
  • 105
3

Try this:

compareColumns <- function(df1, df2) {
  commonNames <- names(df1)[names(df1) %in% names(df2)]
  data.frame(Column = commonNames,
             df1 = sapply(df1[,commonNames], class),
             df2 = sapply(df2[,commonNames], class)) }
Alex P
  • 1,574
  • 13
  • 28
  • 1
    thanks Alex Excellent solution, i added few lines like this, `df3 <- compareColumns(df1,df2) df3$Diff <- ifelse(df3$df1== df3$df2, "Same", "Different") df3[df3$Diff =="Different",]`. – Science11 Aug 17 '17 at 20:12
3

For a more compact method, you could use a list with sapply(). Efficiency shouldn't be a problem here since all we're doing is grabbing the class. Here I add data frame names to the list to create a more clear output.

m <- sapply(list(df1 = df1, df2 = df2), sapply, class)
m[m[, "df1"] != m[, "df2"], , drop = FALSE]
#      df1      df2        
# Col1 "factor" "character"

where df1 and df2 are the data from @ycw's answer.

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
2

If two data frame have same column names, then below will give you columns with different classes.

library(dplyr)
m1 = mtcars
m2 = mtcars %>% mutate(cyl = factor(cyl), vs = factor(cyl))
out = cbind(sapply(m1, class), sapply(m2, class))
out[apply(out, 1, function(x) !identical(x[1], x[2])), ]
Sean Lin
  • 805
  • 4
  • 12
0

We can use sapply with class to loop through all columns in df1 and df2. After that, we can compare the results.

# Create example data frames
df1 <- data.frame(ID = 1:3,
                  Col1 = as.character(2:4),
                  Col2 = 2:4,
                  Col3 = as.Date(paste0("2017-01-0", 2:4)))

df2 <- data.frame(ID = 1:3,
                  Col1 = as.character(2:4),
                  Col2 = 2:4,
                  Col3 = as.Date(paste0("2017-01-0", 2:4)),
                  stringsAsFactors = FALSE)

# Use sapply and class to find out all the class
class1 <- sapply(df1, class)
class2 <- sapply(df2, class)

# Combine the results, then filter for rows that are different
result <- data.frame(class1, class2, stringsAsFactors = FALSE)
result[!(result$class1 == result$class2), ]
     class1    class2
Col1 factor character
www
  • 38,575
  • 12
  • 48
  • 84
  • @Sagar Thanks! But I just realized that I made a mistake. `%in%` is not suitable to filter the rows because the comparison is not restricted to the same position. For example, `c("factor", "character", "numeric") %in% c("factor", "numeric", "Date")` results in `TRUE FALSE TRUE`, but what we need is `TRUE, FALSE, FALSE`. I just replaced `%in%` with `==`, which makes the comparison one to one. – www Aug 17 '17 at 20:25
  • Good catch. I ran previously and worked fine, but good to bring that up. I am not able to upvote anything. Keeps giving me an error. – Sagar Aug 17 '17 at 20:30