6

I Have a tab delim file with 400 columns.Now I want to append text to the column names.ie if there is column name is A and B,I want it to change A to A.ovca and B to B.ctrls.Like wise I want to add the texts(ovca and ctrls) to 400 coulmns.Some column names with ovca and some with ctrls.All the columns are unique and contains more than 1000 rows.A sample code of the delim file is given below:

         X             Y         Z               A       B               C  
        2.34          .89       1.4             .92     9.40            .82
        6.45          .04       2.55            .14     1.55            .04
        1.09          .91       4.19            .16     3.19            .56
        5.87          .70       3.47            .80     2.47            .90

And i want the file to be look like:

       X.ovca     Y.ctrls      Z.ctrls       A.ovca     B.ctlrs       C.ovca  
        2.34          .89       1.4             .92     9.40            .82
        6.45          .04       2.55            .14     1.55            .04
        1.09          .91       4.19            .16     3.19            .56
        5.87          .70       3.47            .80     2.47            .90
starball
  • 20,030
  • 7
  • 43
  • 238
Dinesh
  • 643
  • 5
  • 16
  • 31

3 Answers3

6

If you data.frame is called dat, you can access (and write to) the column names with colnames(dat).

Therefore:

cn <- colnames(dat)
cn <- sub("([AXC])","\\1.ovca",cn)
cn <- sub("([YZB])","\\1.ctrls",cn)
colnames(dat) <- cn

> cn
[1] "X.ovca"  "Y.ctrls" "Z.ctrls" "A.ovca"  "B.ctrls" "C.ovca" 

The \\1 is called back-substitution within your regular expression. It will replace \\1 with whatever's inside the parentheses in the pattern. Since inside the parentheses you have a bracket, it will match any of the letters inside. In this case, "A" becomes "A.ovca" and "X" becomes "X.ovca".

If your variable names are more than one letter, easy enough to extend; just look up a bit on regex's.

Ari B. Friedman
  • 71,271
  • 35
  • 175
  • 235
  • Or, in one line: `colnames(dat) <- gsub("([ACX])","\\1.ovca",colnames(dat))`. – Joshua Ulrich Nov 06 '11 at 17:35
  • @JoshuaUlrich Agreed, but the question had multiple pattern/substitution pairs to operate over. – Ari B. Friedman Nov 06 '11 at 17:43
  • @JoshuaUlrich Edited to make clearer. Ideally this would be done with a function that applies the substitution based on a set of pattern/substitution pairs, but I suspect it's overkill for these purposes. – Ari B. Friedman Nov 06 '11 at 20:28
5

Here is a two liner using the stringr package.

nam <- names(mydf)
names(mydf) <- ifelse(nam %in% c('X', 'A', 'Z'), 
   str_c(nam, '.ovca'),  str_c(nam, '.ctrls'))
Ramnath
  • 54,439
  • 16
  • 125
  • 152
3

How about this? You basically find columns that you want to append "ovca" and "ctrls" using %in%, and append the appropriate tag.

> (mydf <- data.frame(X = runif(10), Y = runif(10), Z = runif(10), A = runif(10), B = runif(10), C = runif(10)))
            X         Y         Z         A         B         C
1  0.81030594 0.1624974 0.3977381 0.9619541 0.9866498 0.4424760
2  0.92498687 0.2069429 0.6065115 0.9969835 0.2407364 0.2455184
3  0.11033869 0.2878640 0.5662793 0.7936232 0.6066735 0.8210634

> names(mydf)[names(mydf) %in% c("X", "A", "C")] <- paste(names(mydf)[names(mydf) %in% c("X", "A", "C")], "ovca", sep = ".")
> names(mydf)[names(mydf) %in% c("Y", "Z", "B")] <- paste(names(mydf)[names(mydf) %in% c("Y", "Z", "B")], "ctrls", sep = ".")
> mydf
       X.ovca   Y.ctrls   Z.ctrls    A.ovca   B.ctrls    C.ovca
1  0.81030594 0.1624974 0.3977381 0.9619541 0.9866498 0.4424760
2  0.92498687 0.2069429 0.6065115 0.9969835 0.2407364 0.2455184
3  0.11033869 0.2878640 0.5662793 0.7936232 0.6066735 0.8210634
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197