0

I'm quite a new R user and it would be awesome if someone could help me to figure this out!

I have data looking like this:

eDNA.data 

    Sample     Extraction_batch    Species 1     Species 2    Specie3    Species 4
    CH15_25    CH_1                0             39           10         0
    CH15_89    CH_2                11            0            55         0
    CH15_56    CH_1                0             0            0          13
    CH15_16    CH_1                27            12           0          7
    CH15_21    CH_2                0             0            2          0
    CH15_negA  CH_1                0             1            0          0
    CH15_negB  CH_2                0             0            2          1
    IQ15_10    IQ_1                8             67           43         0
    IQ15_64    IQ_1                17            0            24         6
    IQ15_17    IQ_2                5             0            0          0
    IQ15_87    IQ_1                0             11           7          0
    IQ15_negA  IQ_1                1             0            0          0
    IQ15_negB  IQ_2                0             0            1          1

I have 218 species in total and a lot more samples (148 in total) which are DNA extractions. The Extraction_batch column correspond to which extraction the samples belong (because I could only do 24 at the time) and for each extraction batch there is one negative control corresponding. Negative control are blank samples to see if contamination occured during the extraction.

I would like to subtract the row of the negatives controls (ex. IQ15_negA) to all the rows corresponding to the same value in the column Extraction_batch. So I could obtain a new dataframe contamination free.

How can I do this in R?

  • When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Feb 26 '18 at 17:28
  • Possible duplicate of [Filtering row which contains a certain string using dplyr](https://stackoverflow.com/questions/22850026/filtering-row-which-contains-a-certain-string-using-dplyr) – gavg712 Feb 26 '18 at 17:29
  • some verbs / vocabulary to help you search for related questions & solve your problem: you're trying to group by `Extraction_batch` and normalize your other columns relative to a specific row within each group. It may also be easier/more natural (though not at first glance) to separate the "negative" rows into their own `data.frame`, separate from this table. You can then join the "negative" table to the normal table and subtract. – MichaelChirico Feb 26 '18 at 17:33

1 Answers1

0

Consider subsetting dataframe by neg values, then merge neg_df and clean_df together for a column-wise subtractions:

# CREATE SAMPLE STUB NAME
df$Sample_stub <- substr(df$Sample, 1, 4)

# FILTER FOR NEG ROWS, DROP SAMPLE, RENAME SPECIES COLS WITH NEG_ PREFIX
neg_df <- transform(df[grepl("neg", df$Sample),], Sample = NULL)
names(neg_df)[grep("Species", names(neg_df))] <- paste0("Neg_", names(neg_df)[grep("Species", names(neg_df))])

# FILTER FOR CLEAN ROWS
clean_df <- df[!grepl("neg", df$Sample),]

# MERGE BOTH SETS, DROP STUB NAME
final_df <- merge(clean_df, neg_df, by=c("Sample_stub", "Extraction_batch"))[-1]

# BLOCKWISE (MULTI-COLUMN) SUBTRACTION
final_df[, grep("^Species", names(final_df))] <- final_df[, grep("^Species", names(final_df))] - 
                                                 final_df[, grep("^Neg", names(final_df))]
# REMOVE NEG SPECIES COLUMNS
final_df[, grep("^Neg", names(final_df))] <- NULL

final_df
#   Extraction_batch  Sample Species1 Species2 Species3 Species4
# 1             CH_1 CH15_25        0       38       10        0
# 2             CH_1 CH15_56        0       -1        0       13
# 3             CH_1 CH15_16       27       11        0        7
# 4             CH_2 CH15_89       11        0       53       -1
# 5             CH_2 CH15_21        0        0        0       -1
# 6             IQ_1 IQ15_10        7       67       43        0
# 7             IQ_1 IQ15_64       16        0       24        6
# 8             IQ_1 IQ15_87       -1       11        7        0
# 9             IQ_2 IQ15_17        5        0       -1       -1
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thanks a lot! I've tried several times your script but it seems there is a problem at the second step `neg_df <- setNames(transform(df[grepl("neg", df$Sample),], Sample = NULL), c("Sample_stub", "Extraction_batch", paste0("Neg_", names(neg_df)[grep("Species", names(neg_df))])))` an error message appears `Error in paste0("Neg_", names(neg_df)[grep("Species", names(neg_df))]) : object 'neg_df' not found` – Noémie Leduc Feb 26 '18 at 22:04
  • See update where subsetting rows and renaming columns are done in two different lines. – Parfait Feb 26 '18 at 22:08
  • It's almost working! After merging the `clean_df` and the `neg_df` the species names are duplicated and appear like **species1.x** and **species1.y** so at the blockwise subtraction the column of the neg_df is not subtracted :/ – Noémie Leduc Feb 27 '18 at 01:41
  • You did not successfully rename as it should have avoided the name collision as it intended. I'm seeing now your *species* is lower case unlike your post above. R is case sensitive! – Parfait Feb 27 '18 at 02:29
  • I've changed the species names and it's working perfectly now! Thanks a lot! – Noémie Leduc Feb 27 '18 at 19:06