0

I have a data frame with 192 columns. I want to make a 96 column data frame by subtracting matching column based in pairing. The pairing info is found in the data frame Pairing in the column Match . The column Pos matches as substring the column names of my data frame that I want to subtract.

How do I use the pairing info in Pairing to identify which column to subtract.

  > Pairing
Match              Pos
Control_70         001_A01
Control_56         001_A02
   Case_70         001_A03
   Case_56         001_A04
Control_21         001_A05
   Case_21         001_A06


> head(matures.cpm.spike.batch[,1:6])
              001_A01_S1 001_A02_S2 001_A03_S3 001_A04_S4 001_A05_S5 001_A06_S6
hsa-let-7a-5p  16.566813  11.415796  12.400252  22.701457   8.864882  20.442599
hsa-let-7b-5p  15.574190  11.107133  12.196465  17.954547   8.527478  25.788286
hsa-let-7c-5p   5.976763   4.372978   5.984685   9.821348   6.341252   7.480211
hsa-let-7d-3p  16.508818  10.697730  11.001534  18.375286   7.583910  24.974774
hsa-let-7d-5p  13.273824   5.134547   9.456675  11.567230   7.096485  13.294108
hsa-let-7f-5p  13.900711   9.804384  11.481614  20.002110   7.878241  17.295909
user2300940
  • 2,355
  • 1
  • 22
  • 35
  • 1
    It's unclear how to pair columns. Maybe you should remove all the unnecessary information (meaning unnecessary columns in Pairing) and give a practical example. – Iaroslav Domin May 04 '17 at 09:26
  • Updated. Maybe this works as a practical example – user2300940 May 04 '17 at 09:29
  • Your `Pairing$Pos` doesn't match the header you have shown in your data. Also, it would help a lot if you could provide a dput of both, instead of pasting the data. See [How to make a great R reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Adam Quek May 04 '17 at 09:37
  • Is matches it partly – user2300940 May 04 '17 at 09:37
  • So basically you want to subtract `001_A01_S1` column from `001_A01_S2` column? Give better examples and please include the code you have tried and failed – Sotos May 04 '17 at 09:39
  • Maybe I wasn't clear enough, by practical example I meant e.g. how would you decide which column you want to subtract from `001_A03_S3`. I suspect it'd be `001_A01_S1` because `001_A03` is `Case` and `001_A01` is `Control` with the same indices (70), but it's unclear from what you've written. – Iaroslav Domin May 04 '17 at 09:48
  • That is correct. The indeces behind Case/Control in the Match column is desicive. f.ex: Case_70 from Control_70 – user2300940 May 04 '17 at 10:52

2 Answers2

1

I am assuming that for every Case there is exactly one Control and vice versa. It seemed easiest to transform your Pairing data.frame to align the Case and the Control. Once that is done, you can just build the data frame you want.

## First, recreate your data
Pairing = read.table(text="Match              Pos
Control_70         001_A01
Control_56         001_A02
   Case_70         001_A03
   Case_56         001_A04
Control_21         001_A05
   Case_21         001_A06",
header=TRUE)

matures.cpm.spike.batch = read.table(text=" 001_A01_S1 001_A02_S2 001_A03_S3 001_A04_S4 001_A05_S5 001_A06_S6
hsa-let-7a-5p  16.566813  11.415796  12.400252  22.701457   8.864882  20.442599
hsa-let-7b-5p  15.574190  11.107133  12.196465  17.954547   8.527478  25.788286
hsa-let-7c-5p   5.976763   4.372978   5.984685   9.821348   6.341252   7.480211
hsa-let-7d-3p  16.508818  10.697730  11.001534  18.375286   7.583910  24.974774
hsa-let-7d-5p  13.273824   5.134547   9.456675  11.567230   7.096485  13.294108
hsa-let-7f-5p  13.900711   9.804384  11.481614  20.002110   7.878241  17.295909",
header=TRUE)

## Build Matches to replace your Pairing
Control = Pairing[grep("Control", Pairing$Match),]
Control = Control[order(Control$Match),]
Case = Pairing[grep("Case", Pairing$Match),]
Case = Case[order(Case$Match),]
Matches = cbind(Control, Case)

# Uses Matches to build desired data.frame
Diffs = data.frame(matures.cpm.spike.batch[, Matches[1,4]] - 
        matures.cpm.spike.batch[, Matches[1,2]])
colnames(Diffs)[1] = sub("Control", "Diff", Matches[1,1])
for(i in 2:nrow(Matches)) {
    Diffs[,i] = matures.cpm.spike.batch[, Matches[i,4]] - 
        matures.cpm.spike.batch[, Matches[i,2]]
    colnames(Diffs)[i] = sub("Control", "Diff", Matches[i,1])
}

## Result
    Diff_21   Diff_56   Diff_70
1 11.577717 11.285661 -4.166561
2 17.260808  6.847414 -3.377725
3  1.138959  5.448370  0.007922
4 17.390864  7.677556 -5.507284
5  6.197623  6.432683 -3.817149
6  9.417668 10.197726 -2.419097
G5W
  • 36,531
  • 10
  • 47
  • 80
0

Just in case, a different approach:

We are going to need a modified matching data frame, with separate columns for cases and controls:

library(tidyr)
library(reshape2)

P <- Pairing %>% 
    separate(Match, into = c("cc", "ind"), sep = "_") %>% 
    dcast(ind ~ cc, value.var = "Pos")

P:

  ind    Case Control
1  21 001_A06 001_A05
2  56 001_A04 001_A02
3  70 001_A03 001_A01

We also want colnames in matures.cpm.spike.batch to match names in P:

df <- matures.cpm.spike.batch
colnames(df) <- gsub("^X|_S.*", "", colnames(df))

colnames(df):

[1] "001_A01" "001_A02" "001_A03" "001_A04" "001_A05" "001_A06"

Now we can finish it simply as following:

case <- df[, P$Case]
control <- df[, P$Control]
res <- case - control

res:

                001_A06   001_A04   001_A03
hsa-let-7a-5p 11.577717 11.285661 -4.166561
hsa-let-7b-5p 17.260808  6.847414 -3.377725
hsa-let-7c-5p  1.138959  5.448370  0.007922
hsa-let-7d-3p 17.390864  7.677556 -5.507284
hsa-let-7d-5p  6.197623  6.432683 -3.817149
hsa-let-7f-5p  9.417668 10.197726 -2.419097
Iaroslav Domin
  • 2,698
  • 10
  • 19