1

I am struggling to actually use the results of a setdiff on two data frames (it makes sense to me as a vector, but not as much for data frames).

REPREX:

m1 <- mtcars
m2 <- mtcars
m2[m2$cyl == 4, "cyl"] <- 3.99

setdiff(m1,m2)
                  cyl
Mazda RX4           6
Datsun 710          4
Hornet Sportabout   8

# I know the 6 and 8 are there because the 3.99 changes the cyl 
# to decimals; but this isn't even 
# all the actual values that ended up changing types lol. 

I had expected the result would be some kind of similar output to detailing the rows that are different. Not this exact output; but some output that would lead me identifying the values that are different:

m1
               cyl
Datsun 710       4
Merc 240D        4
Merc 230         4
Fiat 128         4
Honda Civic      4
Toyota Corolla   4
Toyota Corona    4
Fiat X1-9        4
Porsche 914-2    4
Lotus Europa     4
Volvo 142E       4

m2
                cyl
Datsun 710     3.99
Merc 240D      3.99
Merc 230       3.99
Fiat 128       3.99
Honda Civic    3.99
Toyota Corolla 3.99
Toyota Corona  3.99
Fiat X1-9      3.99
Porsche 914-2  3.99
Lotus Europa   3.99
Volvo 142E     3.99

I know you can do m1 != m2 and get a data frame of whether the values match:

                      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
Mazda RX4           FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Mazda RX4 Wag       FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Datsun 710          FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Hornet 4 Drive      FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Hornet Sportabout   FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Valiant             FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Duster 360          FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Merc 240D           FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Merc 230            FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Merc 280            FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Merc 280C           FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Merc 450SE          FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Merc 450SL          FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Merc 450SLC         FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Cadillac Fleetwood  FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Lincoln Continental FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Chrysler Imperial   FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Fiat 128            FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Honda Civic         FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Toyota Corolla      FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Toyota Corona       FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Dodge Challenger    FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
AMC Javelin         FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Camaro Z28          FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Pontiac Firebird    FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Fiat X1-9           FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Porsche 914-2       FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Lotus Europa        FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Ford Pantera L      FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Ferrari Dino        FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Maserati Bora       FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Volvo 142E          FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

But I'm not sure what the followup is to quickly recognize what these differences are in value terms visually. Maybe this is a bad question, just kind of expected a more actionable output.

The dplyr::setdiff does show the entire row for any rows that have at least 1 change. But is there something I'm missing for leveraging setdiff in base for seeing what the differences are together?

dplyr::setdiff(m1,m2)
                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
Carlos M.
  • 303
  • 2
  • 7

1 Answers1

1

I think the output you are looking for is closer to what dplyr::anti_join provides

dplyr::anti_join(m1, m2, by = 'cyl')

#               mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
#Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
#Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
#Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
#Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
#Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
#Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
#Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
#Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
#Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

dplyr::anti_join(m2, m1, by = 'cyl')

#               mpg  cyl  disp  hp drat    wt  qsec vs am gear carb
#Datsun 710     22.8 3.99 108.0  93 3.85 2.320 18.61  1  1    4    1
#Merc 240D      24.4 3.99 146.7  62 3.69 3.190 20.00  1  0    4    2
#Merc 230       22.8 3.99 140.8  95 3.92 3.150 22.90  1  0    4    2
#Fiat 128       32.4 3.99  78.7  66 4.08 2.200 19.47  1  1    4    1
#Honda Civic    30.4 3.99  75.7  52 4.93 1.615 18.52  1  1    4    2
#Toyota Corolla 33.9 3.99  71.1  65 4.22 1.835 19.90  1  1    4    1
#Toyota Corona  21.5 3.99 120.1  97 3.70 2.465 20.01  1  0    3    1
#Fiat X1-9      27.3 3.99  79.0  66 4.08 1.935 18.90  1  1    4    1
#Porsche 914-2  26.0 3.99 120.3  91 4.43 2.140 16.70  0  1    5    2
#Lotus Europa   30.4 3.99  95.1 113 3.77 1.513 16.90  1  1    5    2
#Volvo 142E     21.4 3.99 121.0 109 4.11 2.780 18.60  1  1    4    2
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Ok, this anti-join command seems to be a more specifiable setdiff; and I do like the by argument. I'll label this one correct as I think the anti-join + by will do what I like. I just had this idea in my head that i'd get some kind of output similar to ```m1 != m2``` but with the TRUE values more visibly noted. Thank you! – Carlos M. Dec 03 '20 at 12:55