R - How to filter a column with another column's string set

Question

I'm essentially trying to "background subtract" the data I have. So here I have two sample data sets. Please note that the 'm/z' column cannot be used for the subtraction since the associated numbers will not always be exactly the same. With multiple strings, I have no idea how this can be done and I'm not even sure if I'm asking the right questions since I'm new to this.

Solution = df_sub < - anti_join(df, dfbkg, by = 'Composition') (This works with strings too!)

df <- read.csv(file)

m/z             Composition

241             C15 H22 O Na                
265             C15 H15 N5 
301             C16 H22 O4 Na 
335             C19 H20 O4 Na           
441             C26 H42 O4 Na

and my "background"

df_bkg <- read.csv(file_2)

m/z             Composition

274             C18 H19 O Na 
301             C16 H22 O4 Na 
317             C16 H22 O5 Na       
441             C26 H42 O4 Na 
241             C15 H22 O Na

The background contains three similar strings in the Composition column compared to my data. I would like the new "subtracted dataset" to look like this..

df_sub <- (df - df_bkg)

m/z             Composition

274             C18 H19 O Na  
317             C16 H22 O5 Na

Thank you for any help you can offer.

Have you looked at [`join`](http://dplyr.tidyverse.org/reference/join.html) — Tung, Feb 28 '18 at 17:27
@Tung Since links break and might point to different versions than a user has installed, when possible people here tend to direct to internal docs like `?dplyr::join`. Fwiw, this is an `anti_join`, right? — Frank, Feb 28 '18 at 17:29
@Frank Yes I believe I'll have to use anti_join but I'm not quite understanding how to use an entire column of strings to define the join by. For this example, would it be ' anti_join(df$Composition, df_bkg$Composition, by= NULL, copy = TRUE ' ?? Not sure what to do with the 'by = ' argument as I am not understanding it rn — Ragstock, Feb 28 '18 at 18:47
`anti_join(df_bkg, df, by="m/z")` doesn't work? (Your example is hard to copy-paste into R, so I'm not testing to see if this works. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 if you're interested in improving that.) — Frank, Feb 28 '18 at 19:02
@Frank Using the m/z column won't work - I simplified it for simplicity sake (ie 241 might be 241.00023 in one dataframe but 241.00043 in another - but composition will stay the same) - Thank you for that link - I'll look at it and make sure that I use better data presentation for future questions! — Ragstock, Feb 28 '18 at 19:19
Ok, sorry I hadn't notice you mentioned that in the OP. I guess at this point, I can't help without a more concrete example. You could try `by="Composition"` but whether that works or not might depend on precisely what format that column has. Let me know if you want the question reopened. — Frank, Feb 28 '18 at 19:23
by = 'Composition' works! My problem was determining which method works for strings and not just value. This helps a lot - thank you for finding the duplicate question. — Ragstock, Feb 28 '18 at 19:32
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/165997/discussion-between-ragstock-and-frank). — Ragstock, Feb 28 '18 at 21:13

score 0 · Answer 1 · answered Feb 28 '18 at 17:26

0

Try using:

df_sub <- df_bkg[!df_bkg["Composition"] %in% df_["Composition"]]

The code just chooses rows of df_bkg that do not occur in df! I hope this answers your question!

answered Feb 28 '18 at 17:26

tobiaspk1

378
1
11

Unfortunately this doesn't seem to be working. Also it seems it should be switched to complete the argument (df minus df_bkg), when i do it in the way you describe - i get the second set of data as my output. Could you explain this a bit more? – Ragstock Feb 28 '18 at 18:37
1

I guess a couple syntax changes might fix it `df_bkg[!df_bkg[["Composition"]] %in% df_[["Composition"]], ]` – Frank Feb 28 '18 at 19:26
1

@Frank this seems about right, thanks Frank! – tobiaspk1 Feb 28 '18 at 22:49

R - How to filter a column with another column's string set

1 Answers1