2

I am trying to read a worksheet in an Excel 2010 workbook file into R using both the xlsx and XLConnect packages. Both are dropping leading zeroes on zip code columns despite formatting the cells in the worksheet as 'Text'.

wb <- loadWorkbook('c:/~/file1.xlsx')
sheetNames <- getSheets(wb)

for(i in 1:length(sheetNames)){  # i = 2
    #dat1 <- read.xlsx('c:/~/file1.xlsx', sheetNames[i], as.data.frame = T)
    dat1 <-  readWorksheetFromFile('c:/~/file1.xlsx', sheetNames[i])
}

Does anyone have suggestions for how to deal with this?

screechOwl
  • 27,310
  • 61
  • 158
  • 267

2 Answers2

2

You can also fix this afterwards with sprintf:

# some example data
x <- c(225,4867,52,15732,9514,78142)
# getting the leading zeros back
x <- sprintf("%05s", x)

> x
[1] "00225" "04867" "00052" "15732" "09514" "78142"
ulfelder
  • 5,305
  • 1
  • 22
  • 40
Jaap
  • 81,064
  • 34
  • 182
  • 193
1

You can specify the column types as an argument to readWorksheet() or readWorksheetFromFile(), by using the argument colType = ....

For example, to read all columns as character, use:

readWorkSheet(..., colType="character")
Andrie
  • 176,377
  • 47
  • 447
  • 496