0

I'm essentially trying to "background subtract" the data I have. So here I have two sample data sets. Please note that the 'm/z' column cannot be used for the subtraction since the associated numbers will not always be exactly the same. With multiple strings, I have no idea how this can be done and I'm not even sure if I'm asking the right questions since I'm new to this.

Solution = df_sub < - anti_join(df, dfbkg, by = 'Composition') (This works with strings too!)

df <- read.csv(file)

m/z             Composition

241             C15 H22 O Na                
265             C15 H15 N5 
301             C16 H22 O4 Na 
335             C19 H20 O4 Na           
441             C26 H42 O4 Na 

and my "background"

df_bkg <- read.csv(file_2)

m/z             Composition

274             C18 H19 O Na 
301             C16 H22 O4 Na 
317             C16 H22 O5 Na       
441             C26 H42 O4 Na 
241             C15 H22 O Na 

The background contains three similar strings in the Composition column compared to my data. I would like the new "subtracted dataset" to look like this..

df_sub <- (df - df_bkg)

m/z             Composition

274             C18 H19 O Na  
317             C16 H22 O5 Na       

Thank you for any help you can offer.

Ragstock
  • 55
  • 8
  • Have you looked at [`join`](http://dplyr.tidyverse.org/reference/join.html) – Tung Feb 28 '18 at 17:27
  • 2
    @Tung Since links break and might point to different versions than a user has installed, when possible people here tend to direct to internal docs like `?dplyr::join`. Fwiw, this is an `anti_join`, right? – Frank Feb 28 '18 at 17:29
  • @Frank Yes I believe I'll have to use anti_join but I'm not quite understanding how to use an entire column of strings to define the join by. For this example, would it be ' anti_join(df$Composition, df_bkg$Composition, by= NULL, copy = TRUE ' ?? Not sure what to do with the 'by = ' argument as I am not understanding it rn – Ragstock Feb 28 '18 at 18:47
  • `anti_join(df_bkg, df, by="m/z")` doesn't work? (Your example is hard to copy-paste into R, so I'm not testing to see if this works. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 if you're interested in improving that.) – Frank Feb 28 '18 at 19:02
  • @Frank Using the m/z column won't work - I simplified it for simplicity sake (ie 241 might be 241.00023 in one dataframe but 241.00043 in another - but composition will stay the same) - Thank you for that link - I'll look at it and make sure that I use better data presentation for future questions! – Ragstock Feb 28 '18 at 19:19
  • Ok, sorry I hadn't notice you mentioned that in the OP. I guess at this point, I can't help without a more concrete example. You could try `by="Composition"` but whether that works or not might depend on precisely what format that column has. Let me know if you want the question reopened. – Frank Feb 28 '18 at 19:23
  • 1
    by = 'Composition' works! My problem was determining which method works for strings and not just value. This helps a lot - thank you for finding the duplicate question. – Ragstock Feb 28 '18 at 19:32
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/165997/discussion-between-ragstock-and-frank). – Ragstock Feb 28 '18 at 21:13

1 Answers1

0

Try using:

df_sub <- df_bkg[!df_bkg["Composition"] %in% df_["Composition"]]

The code just chooses rows of df_bkg that do not occur in df! I hope this answers your question!

tobiaspk1
  • 378
  • 1
  • 11
  • Unfortunately this doesn't seem to be working. Also it seems it should be switched to complete the argument (df minus df_bkg), when i do it in the way you describe - i get the second set of data as my output. Could you explain this a bit more? – Ragstock Feb 28 '18 at 18:37
  • 1
    I guess a couple syntax changes might fix it `df_bkg[!df_bkg[["Composition"]] %in% df_[["Composition"]], ]` – Frank Feb 28 '18 at 19:26
  • 1
    @Frank this seems about right, thanks Frank! – tobiaspk1 Feb 28 '18 at 22:49