3

I have a dataframe and want the number variable to be four digits long, in order to do this I need to add between 1-3 leading zeroes, the method I chose to do this is the sprintf function, as it is immaterial that the number is converted to character class. Unfortunately the results are not coming out in the order I want

The test data frame is made as follows and the leading 0 column added on as a third column to allow easy comparison. As can be seen by running the code the order that the leading zero numbers are pasted in does not correspond to the original number order

test <- as.data.frame(cbind(letters,seq(from=1, to=26)))
test[,3]<-sprintf("%04d", test[,2])

by rearranging the data frame order alphabetically by classing the original number column as characters, the sprintf number are now in ascending order although the number series is not.

test.two <- as.data.frame(cbind(letters,seq(from=1, to=26)))
test.two <- test.two[i <-order(as.character(test.two[,2])),]
test.two[,3]<-sprintf("%04d", test.two[,2])

I can create the desired data set by Frankensteining it togther.

test.three <- as.data.frame(cbind(letters,seq(from=1, to=26)))
test.three[,3]<-test.two[,3]

However I would like to know what I am doing wrong and what method would give me the result I expected to get from what I thought was a simple operation!

smci
  • 32,567
  • 20
  • 113
  • 146
Jonno Bourne
  • 1,931
  • 1
  • 22
  • 45
  • 3
    Don't use `as.data.frame(cbind(letters,seq(from=1, to=26)))`. `cbind` will create a matrix. A matrix can only have one type of data, and the integers created by `seq` will be coerced to characters. Depending on your `stringsAsFactors` `options`, the characters might then be converted to factors when applying `as.data.frame`. If you are not familiar with the peculiarities of factors, you may be surprised when applying various functions on them. Just use `data.frame(x = letters, y = seq(from=1, to=26))`. – Henrik Feb 27 '14 at 11:49

1 Answers1

5

This is due to the the second column being a factor.

test <- as.data.frame(cbind(letters,seq(from=1, to=26)))
sapply(test, class)
##  letters       V2 
## "factor" "factor" 
test[,3]<-sprintf("%04d", test[,2])

as.numeric(test$V2)
##  [1]  1 12 20 21 22 23 24 25 26  2  3  4  5  6  7  8  9 10 11 13 14 15 16 17 18
## [26] 19

test$V2 <- as.integer(as.character(test$V2))
test[,4]<-sprintf("%04d", test[,2])

##   letters V2   V3   V4
## 1       a  1 0001 0001
## 2       b  2 0012 0002
## 3       c  3 0020 0003
## 4       d  4 0021 0004
## 5       e  5 0022 0005
## 6       f  6 0023 0006
Jake Burkhead
  • 6,435
  • 2
  • 21
  • 32
  • Thanks that fixed the problem perfectly, although the real data set was not made using cbind the ID number variable had been automatically set as a factor, re-classing on the original data set fixed the problem. – Jonno Bourne Feb 27 '14 at 13:06