4

I have a data frame A which has numeric column like:

zip code
00601
00602
00607

and so on.

If I read this in R using read.csv, they are read as numeric entities. I want them as factors.

I tried converting them back to factor using

A <- as.factor(A)

But this removes starting zeroes and make A like

zip code
601
602
607

I do not want this. I want to save zeroes.

tonytonov
  • 25,060
  • 16
  • 82
  • 98
Ayush Raj Singh
  • 863
  • 5
  • 16
  • 20
  • Are you sure they're numeric? – Thomas Jun 28 '13 at 08:56
  • 1
    @Thomas if stored in `R` as `00607` they surely are not. It's strange bacause the OP says both `they are read as numeric` and `I have a data frame A which has numeric column like: zip code 00601 ...` – Michele Jun 28 '13 at 09:36

3 Answers3

4

Use colClasses in your read.csv call to read them in as character or factor: read.csv(*, colClasses="factor").

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
1

You may need to add leading zeros - as in this post. This first converts to a character class. Then, you can change this to a factor, which maintains the leading zeros.

Example

A <- data.frame("zip code"=c(00601,00602,00607))
class(A$zip.code) #numeric
A$zip.code <- sprintf("%05d", A$zip.code)
class(A$zip.code) #character
A$zip.code <- as.factor(A$zip.code)
class(A$zip.code) #factor

Resulting in:

> A$zip.code
[1] 00601 00602 00607
Levels: 00601 00602 00607

Writing A as a .csv file

write.csv(A, "tmp.csv")

results in

"","zip.code"
"1","00601"
"2","00602"
"3","00607"
Community
  • 1
  • 1
Marc in the box
  • 11,769
  • 4
  • 47
  • 97
  • Hi Marc, Thank you for an alternative solution. I am getting to know a lot of functions. Just one doubt, if I write this data frame uaing write.csv(), zip.code is taken as numeric vector (it automatically removes zeroes in starting, no matter zip is factor or character in R console). How do I write it as it is (means with zeroes in starting)?? – Ayush Raj Singh Jun 28 '13 at 09:21
  • @AyushRajSingh - In my case, when I write a .csv file, `zip.code`is taken as text. I have added what my output looks like in the answer. – Marc in the box Jun 28 '13 at 09:26
  • I tried your same example, but when I write it, zeroes are getting disappeared. What may be the problem? By "write it": I mean, when I open it in Excel after using write.csv(). – Ayush Raj Singh Jun 28 '13 at 09:41
  • 1
    OK, that's an Excel problem. In Excel, there is actually a format for zip codes: Highlight column > Format Cells > Special > Zip Code – Marc in the box Jun 28 '13 at 09:53
0

everything without any text qualifier is (attempted to be) read as numeric, so the issue is basically to know how your data (in case 00607) is stored on the flat text file. If without text qualifier, you can either follow the suggestion of @Hong Ooi or use

read.csv(*, colClasses="character")

and then convert each column accordingly (in case you don' want/need all of them to factor). Once you have a character vector (a data.frame column) converting it to factor is just straightforward

> zipCode <- c("00601", "00602", "00607")
> factor(zipCode)
[1] 00601 00602 00607
Levels: 00601 00602 00607
Michele
  • 8,563
  • 6
  • 45
  • 72