-1

There are several questions & answers about this topic; however, none seem to directly answer the question or I cannot seem to locate it. I appreciate the help in advance!

I have two data frames

df1 <- write.csv("df1.csv")
df2 <- write.csv("df2.csv")

I want to make

df3 <- data.frame([df1$LikeColumn != df2$LikeColumn],)

My goal is to make a data frame (df3) that consists of all observations (rows) where the two "LikeColumn" values are not equal.

Notes: The headers are the same (df1$x header is the same as df2$x) There are the same number of columns There are not the same number of rows

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Matt
  • 37
  • 1
  • 8
  • So you want only the rows that are in one data frame, but not in both? Or only the rows from `df1` that are not in `df2`? A small toy example with desired output would help us understand your goal and give us something to demonstrate code on. – Gregor Thomas Feb 28 '18 at 19:39
  • only rows from df1 that are not in both – Matt Feb 28 '18 at 19:40
  • 2
    Can you make a reproducible example? http://reprex.tidyverse.org/articles/reprex.html – tonyk Feb 28 '18 at 19:40
  • df1 has 3 million records df2 has 12 million records i need those df1 records that are not in df2 – Matt Feb 28 '18 at 19:40
  • 2
    `df1[! df1$LikeColumn %in% df2$LikeColumn, ]`, or with `dplyr::anti_join(df1, df2, by = "LikeColumn")`. – Gregor Thomas Feb 28 '18 at 19:41
  • Possibly anti_join? https://www.rdocumentation.org/packages/dplyr/versions/0.7.3/topics/join – tonyk Feb 28 '18 at 19:42
  • 1
    This code makes no sense: `df1 <- write.csv("df1.csv")`. The df1 dataframe if it existed would then be wiped out since the `write.table` and its variants all return NULL. – IRTFM Feb 28 '18 at 19:42
  • Possible duplicate of [Find complement of a data frame (anti - join)](https://stackoverflow.com/questions/28702960/find-complement-of-a-data-frame-anti-join) – Maurits Evers Feb 28 '18 at 22:16

1 Answers1

0

Using base R:

df1[! df1$LikeColumn %in% df2$LikeColumn, ]

With dplyr

library(dplyr)
anti_join(df1, df2, by = "LikeColumn")

This question is closely related: Compare two data.frames to find the rows in data.frame 1 that are not present in data.frame 2, but it focuses on finding full rows, whereas in this case we are only looking at values in a single column.

Also see Find complement of a data frame, which has a data.table solution which will be most efficient if you have large data and convert them to keyed data tables.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Thanks for your help! I will look at the suggested threads. Please note that df1[! df1$LikeColumn %in% df2$LikeColumn, ] is only giving me the count in the console. It is not creating data frame @Gregor – Matt Feb 28 '18 at 21:11
  • The code I posted should work fine based on your description. If you provide a reproducible example as tonyk suggested in comments, then we can actually test and demonstrate the code (and see anything that might be weird with your data structures). Or maybe you just need to assign the result? `df3 <- ...`? Not sure what you mean by "giving me the count in the console"... – Gregor Thomas Feb 28 '18 at 21:39
  • i cant thank you enough first off! I am using Rstudio - i am referencing the console output window. When i run df1[! df1$LikeColumn %in% df2$LikeColumn, ] the console window displays the results but does not create a dataframe. when i run df1 <- [! df1$LikeColumn %in% df2$LikeColumn, ] i get this error Error: unexpected '[' in "df1<-[".. i have tried assigning it to a new data frame as well df3[! df1$LikeColumn %in% df2$LikeColumn, ] and this does not work either. I just need a new df with the rows and cols of df1[! df1$LikeColumn %in% df2$LikeColumn, ] – Matt Mar 01 '18 at 15:22
  • `df1[! df1$LikeColumn %in% df2$LikeColumn, ]` creates a data frame, it just doesn't assign it to anything to keep it around. If you want to assign that value it to a new name, like `df3`, then do `df3 <- df1[! df1$LikeColumn %in% df2$LikeColumn, ]`. If you want to overwrite `df1` with the new version, then use `df1 <- df1[! df1$LikeColumn %in% df2$LikeColumn, ]`. Just like `1+1` will print `2`, but if you want to save that result in a variable `foo`, you need to assign it `foo <- 1+1`. – Gregor Thomas Mar 01 '18 at 15:24
  • `foo + 1` will print 3, but not assign it to anything. If you want to change `foo` to increase its value by 1, then you do `foo <- foo + 1`. To assign something you must use `<-` or `=`. The `[` is taking a subset, not assigning. – Gregor Thomas Mar 01 '18 at 15:27
  • In my answer, I just give the expression for the result, I didn't assign the value to anything because I don't know what you want to name the result. You can call it whatever you want. `whatever_you_want <- ...`, where `...` is one of my complete lines of code, otherwise unmodified. – Gregor Thomas Mar 01 '18 at 15:30
  • got it! thanks! i was missing the "<-df1" part: df3 <- df1 [! df1$LikeColumn %in% df2$LikeColumn, ] – Matt Mar 01 '18 at 15:31
  • To be pedantic, you were missing the `df3 <-` part, and incorrectly omitting the `df1` before the bracket. – Gregor Thomas Mar 01 '18 at 15:31