1

When I write a .csv file from R where my group names start with a leading zero value, the leading zeros are maintained. However when I import the .csv the leading zeros are dropped and the group names are converted to integers. How can I keep the leading zero in my group names when I import a .csv file in R?

Example

df <- data.frame(matrix(ncol = 1, nrow = 3))
colnames(df)[1] <- 'site'
df$site <- c('01','02','03')
str(df) # the site name has a character class and the leading zeros are maintained
write.csv(df,'test.csv', row.names = FALSE) # I opened in notepad to verify that the leading zeros are maintained

df2 <- read.csv('test.csv')
str(df2) # the site name is integer class and leading zeros have been dropped
tassones
  • 891
  • 5
  • 18
  • 2
    Try `identical( 0023, 23 )`. There is no such thing as a leading zero is _numeric_ variable. While you can _format it to text with leading zeros_ those drop when re-reading. Unless you force a read `as,character`. – Dirk Eddelbuettel Aug 09 '22 at 14:03

2 Answers2

3

Specify the column as "character" .

read.csv("test.csv", colClasses = c(site = "character"))
##   site
## 1   01
## 2   02
## 3   03

If you don't have any other columns or if the other columns are also character this could be shortened to:

read.csv("test.csv", colClasses = "character")
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Isn't that essentially the same as what I've written. Ok, you are using `read.csv`, but in general all the information is already included in my answer. – hannes101 Aug 09 '22 at 14:38
  • Both answer the question but this answer has more concise code, does it without introducing extra packages and uses precisely the setup in the question. – G. Grothendieck Aug 09 '22 at 14:42
0

Define the column type explicitly, for example using vroom. But other packages also provide that functionality. You can determine the column specifications with spec() first.

library(vroom)
spec(vroom('test.csv'
             , delim = ","
             ))
df2 <- vroom('test.csv'
             , delim = ","
             , col_types = cols(
               site = col_character()
             )
             )
hannes101
  • 2,410
  • 1
  • 17
  • 40