2

I have a large dataframe where certain column names partially match as shown below

>data
   K55.NSC.     H55.NSC.  K55.TLC.     H55.TLC.
1  27520.09     306.6525  284686.6 8.623333e+00  ... ...
2  57455.33  415244.7340  284693.4 1.319481e+04  ... ...
3  85977.20  814413.8720  284700.1 2.560542e+04  ... ...
4 149511.56 1629331.9228  284713.4 5.103493e+04  ... ...
5 285171.80 3213409.0205  284739.7 1.042913e+05  ... ...
6 510536.16 6233470.3062  284790.7 1.957055e+05  ... ...

I want to subtract x.NSC from x.TLC and create a new column as x.LLC with the outcomes where x refers the partially matched column name. One way to do is to make separate dataframes using 'grep':

a <- data[,grep('K55', colnames(data))]
data$k55.LLC <- a[1]-a[2]
... ...

But this is time-consuming and I find it difficult to configure a loop for this case. Is there any way to easily deal with the problem within the dataframe without creating a list? I found a similar problem here, though I'm not sure if any of the provided solutions are applicable in my case!

Community
  • 1
  • 1
ToNoY
  • 1,358
  • 2
  • 22
  • 43

1 Answers1

3

One approach would be to grab all the prefixes out of the column names of data with strsplit and then use sapply to generate the subtracted values for each:

(n <- unique(sapply(strsplit(colnames(data), "\\."), "[", 1)))
# [1] "K55" "H55"
sapply(n, function(x) data[,paste0(x, ".TLC.")] - data[,paste0(x, ".NSC.")])
#            K55           H55
# [1,]  257166.5     -298.0292
# [2,]  227238.1  -402049.9240
# [3,]  198722.9  -788808.4520
# [4,]  135201.8 -1578296.9928
# [5,]    -432.1 -3109117.7205
# [6,] -225745.5 -6037764.8062

If data has column names that don't end in .TLC. or .NSC., then you could filter out these column names with something like:

(cols <- grep("(\\.TLC\\.$)|(\\.NSC\\.$)", colnames(data), value=TRUE))
# [1] "K55.NSC." "H55.NSC." "K55.TLC." "H55.TLC."

Then, you could use cols instead of colnames(data) in the first snippit above.

josliber
  • 43,891
  • 12
  • 98
  • 133
  • this is of great help but doesn't work when I have some columns within the dataframe with some other extension e.g. .LLC. ! An update possible? – ToNoY Sep 10 '14 at 18:42
  • @ToNoY I've updated with some code that filters down to the column names ending in .TLC. or .NSC. – josliber Sep 10 '14 at 19:34