4

There are similar questions to this one available but none is addressing this question like Changing column names of a data frame in R

actually, I have a matrix like below

M <- data.frame(matrix(rnorm(5),100,50))

I was trying to make a name list for each column of it as follows:

colnames(M) <- paste( LETTERS, "col", sep ="")

this would work if the number of columns are equal or less than the number of letters. what if I want to

1- repeat the letters after it riches the end

2- randomly generate names for each column with a specific word but random letters like Ccol GFcol Mercol as many columns or as many rows as it has ?

Community
  • 1
  • 1
  • 1
    You could use `rep`. ie. `rep(paste( LETTERS, "col", sep =""), length.out=ncol(M))` for the first question. I didn't quite get the second part though you may check `?sample` – akrun Mar 10 '15 at 10:31
  • @akrun thanks akrun, that is a great help already, I'll try to figure out the second one based on sample, if I could not I will post a better question –  Mar 10 '15 at 11:00
  • The conditions in the second question is not clear. You have one with `Ccol`, next `GFcol`, etc. So, is there any limitations in the number of characters as prefix before `col`? – akrun Mar 10 '15 at 11:02
  • @akrun no, there is not any limitation! just unique column names –  Mar 10 '15 at 12:30

2 Answers2

3

For the second part of the question (as the first one seems to be solved by akrun) you could try:

# Generate unique combinations of at most three letters
LET <- apply(expand.grid(LETTERS, LETTERS, LETTERS)[sample(1:676, dim(M)[2]),], 1, function(x) x[sample(1:3, sample(1:3))])
colnames(M) <- paste0(sapply(LET, paste0, collapse = ""), "col")

Which gives:

 head(M, 2)
     AZFcol     OJcol      Gcol    ALPcol     NAcol     VAcol     KEcol      Acol     VBcol     HAcol
1 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018
2  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753
      KYcol    AARcol      Wcol     EAcol    OTAcol     AMcol     AAcol     QAcol      Acol     AMcol
1 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018
2  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753
      AScol     DQcol      Bcol      Jcol     BAcol     AIcol     WEcol    SAUcol      Acol      Acol
1 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018
2  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753
     RAOcol     JAcol    GAEcol    ABQcol     BAcol     TAcol    AAMcol    ACEcol      Kcol     NAcol
1 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018
2  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753
       Bcol    HAEcol     ABcol    AVDcol      Hcol     AQcol     WHcol    KIAcol     QLcol     FRcol
1 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018 -1.842018
2  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753  1.069753
DatamineR
  • 10,428
  • 3
  • 25
  • 45
0

akrun gave you the answer for the first: rep(paste( LETTERS, "col", sep =""), length.out=ncol(M))

For the second, the only difficulty I see is to avoid re-sampling of the same letters so to have unique column numbers. This is like counting in base 26 so you can first count in this base until your number of columns:

    GetNumberSuiteAnyBase <- function(lengthSuite,base){
        nB <- length(base) # radix of your base
        nDigits <- floor(log(lengthSuite-1)/log(nB))+1 # the number of digits you'll need
        numberSuite <- ""
        for(iDigit in 1:nDigits){
            newDigit <- rep(base,each=nB^(iDigit-1),length.out=lengthSuite)
            numberSuite <- paste0(newDigit,numberSuite)
        }
        return(numberSuite)
    }
    library("testthat")
    # as an example:
    expect_equal(as.numeric(GetNumberSuiteAnyBase(5,c(0,1))),c(0,1,10,11,100))
    # with your requirements
    colNames <- GetNumberSuiteAnyBase(ncol(M),LETTERS)

Then if you want these column names to be random you can just use:

    colNames <- paste0(sample(colNames),"col")
cmbarbu
  • 4,354
  • 25
  • 45