-2

I have a rolling window of data. Think of the column names in a matrix as follows:

Jan.94, Feb.94, Mar.94, Apr.94, Feb.94.x, Mar.94.x, Apr.94.x, May.94.x, Mar.94.x.x, Arp.94.x.x, May.94.x.x and so on and so forth.

Essentially, i want to remove all the x's from the colnames so that only the date is kept. The matrix is extremely large. I need to apply a function so it only keeps the first 6 characters and hence remove all the 'x's

JC3019
  • 363
  • 1
  • 9
  • 1
    How do you expect to resolve the situation where removal of the Xs causes two columns to have the same name? – Len Greski Aug 03 '19 at 12:34
  • @LenGreski unless R disallows two (or more) vectors to have the same name, it won't be a problem in my application. – JC3019 Aug 03 '19 at 12:38
  • Please read [How to Create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) and update your question. – Len Greski Aug 03 '19 at 13:32

2 Answers2

1

So this is easy to do, let's say your matrix is called x then you just need

colnames(x) <- gsub(".x","",colnames(x),fixed = T)
RaphaelS
  • 839
  • 4
  • 14
  • This is really helpful! Is there any way to only keep the first 6 characters of the name? – JC3019 Aug 03 '19 at 12:53
  • @JC3019 - what do you mean by "only keep the first 6 characters..."? Use of `""` with the `substr()` function deletes the spaces taken by any instances of `.x` found in a string. If the data only contains a 6 character date designator of `mmm.yy`, then `substr()` will result in 6 character column names. – Len Greski Aug 03 '19 at 13:09
0

If the data is stored in an object of type matrix(), and subsequent operations against the matrix use row and column references instead of named columns, the original answer works fine.

We'll generate a matrix of data, rename the columns, and display the matrix. set.seed() is used to ensure reproducibility of the runif() function.

set.seed(3104)
nameList <- c('Jan.94','Feb.94','Mar.94',
              'Jan.94.x','Feb.94.x','Mar.94.x',
              'Jan.94.x.x','Feb.94.x.x','Mar.94.x.x')
x <- matrix(runif(90),nrow=10,ncol=9)
colnames(x) <- gsub(".x","",nameList,fixed=TRUE)
head(x)

...and the output:

> head(x)
         Jan.94    Feb.94    Mar.94    Jan.94    Feb.94     Mar.94     Jan.94
[1,] 0.73967666 0.3950552 0.4593954 0.5246329 0.9318526 0.97022213 0.51974938
[2,] 0.78333764 0.8019435 0.3277070 0.8342044 0.9564895 0.31632572 0.02162478
[3,] 0.07161414 0.3681912 0.5151378 0.8647585 0.9841725 0.69784065 0.05600622
[4,] 0.92636930 0.6643402 0.2357173 0.6178838 0.5324841 0.42694750 0.13356315
[5,] 0.26566868 0.7210794 0.6275253 0.9630575 0.5757118 0.63363792 0.30718159
[6,] 0.57439103 0.1076186 0.8501558 0.0615584 0.3375161 0.06738025 0.25910038
         Feb.94     Mar.94
[1,] 0.82225954 0.94697173
[2,] 0.03341796 0.08548795
[3,] 0.99208753 0.37739177
[4,] 0.85306984 0.00283353
[5,] 0.61724901 0.16111121
[6,] 0.21789765 0.07376294

However, if one needs to access the columns in an object of type data.frame() with the $ form of the extract operator, one gets unexpected results when multiple columns have the same column name.

# use with data.frame() introduces subtle defect 
# when using the $ form of the extract operator
set.seed(3104)
x <- data.frame(matrix(runif(90),nrow=10,ncol=9))
colnames(x) <- gsub(".x","",nameList,fixed=TRUE)
# extract only retrieves the first column named Jan.94
x$Jan.94

...and the output:

> x$Jan.94
 [1] 0.73967666 0.78333764 0.07161414 0.92636930 0.26566868 0.57439103
 [7] 0.60409610 0.10018717 0.67436946 0.90823532
> 

Creating a data.frame() with multiple columns having the same column name causes the $ form of the extract operator to be unable to access many of the columns in the data frame.

That said, it is possible to extract multiple columns with the same name from a data frame, but it takes a bit more effort.

head(x[,grepl("Jan.94",colnames(x))])

...and the result:

> head(x[,grepl("Jan.94",colnames(x))])
      Jan.94  Jan.94.1   Jan.94.2
1 0.73967666 0.5246329 0.51974938
2 0.78333764 0.8342044 0.02162478
3 0.07161414 0.8647585 0.05600622
4 0.92636930 0.6178838 0.13356315
5 0.26566868 0.9630575 0.30718159
6 0.57439103 0.0615584 0.25910038
> 
Len Greski
  • 10,505
  • 2
  • 22
  • 33