It appears as if the files are downloaded correctly, but you are correct: if you use read.csv
to load the data into R, then some of the columns are interpreted as numeric
, and therefore they lose the leading zeros.
To take your code that downloads files -
stateFIPScodes<-seq(10,13,1)
for(i in seq_along(stateFIPScodes)){
code <- stateFIPScodes[[i]]
URL <- paste0("https://www.census.gov/popest/data/intercensal/county/files/CO-EST00INT-ALLDATA-", code, ".csv")
destfile <- paste0("state2000_2010_",code,".csv")
download.file(URL, destfile)
}
If we use base read.csv
we get no trailing zeros:
library(dplyr)
read.csv("state2000_2010_10.csv") %>%
select(1:5) %>%
head
#> SUMLEV STATE COUNTY STNAME CTYNAME
#> 1 50 10 1 Delaware Kent County
#> 2 50 10 1 Delaware Kent County
#> 3 50 10 1 Delaware Kent County
#> 4 50 10 1 Delaware Kent County
#> 5 50 10 1 Delaware Kent County
#> 6 50 10 1 Delaware Kent County
That's because the first few columns are being read in as numeric.
read.csv("state2000_2010_10.csv") %>% str()
#> 'data.frame': 780 obs. of 50 variables:
#> $ SUMLEV : int 50 50 50 50 50 50 50 50 50 50 ...
#> $ STATE : int 10 10 10 10 10 10 10 10 10 10 ...
#> $ COUNTY : int 1 1 1 1 1 1 1 1 1 1 ...
There are two ways to resolve this:
- Manually pass the data types into
read.csv
, or just prevent all conversions by adding colClasses = "character"
.
- Use
readr::read_csv
which handles it correctly.
We could just prevent automatic coercion:
read.csv("state2000_2010_10.csv", colClasses = "character") %>% str()
#> 'data.frame': 780 obs. of 50 variables:
#> $ SUMLEV : chr "050" "050" "050" "050" ...
#> $ STATE : chr "10" "10" "10" "10" ...
#> $ COUNTY : chr "001" "001" "001" "001" ...
#> $ STNAME : chr "Delaware" "Delaware" "Delaware" "Delaware" ...
You would need to choose what columns you wanted to cast to as.numeric
.
Or you could select the columns, e.g.
read.csv("state2000_2010_10.csv",
colClasses = c(
SUMLEV = "character",
STATE = "numeric",
COUNTY = "character"
)) %>%
select(1:5) %>%
head
#> SUMLEV STATE COUNTY STNAME CTYNAME
#> 1 050 10 001 Delaware Kent County
#> 2 050 10 001 Delaware Kent County
#> 3 050 10 001 Delaware Kent County
#> 4 050 10 001 Delaware Kent County
#> 5 050 10 001 Delaware Kent County
#> 6 050 10 001 Delaware Kent County
Second, you could use readr
, which has more intelligent column type inference:
#> read_csv("state2000_2010_10.csv") %>%
#> select(1:5) %>%
#> head
#> # A tibble: 6 × 5
#> SUMLEV STATE COUNTY STNAME CTYNAME
#> <chr> <int> <chr> <chr> <chr>
#> 1 050 10 001 Delaware Kent County
#> 2 050 10 001 Delaware Kent County
#> 3 050 10 001 Delaware Kent County
#> 4 050 10 001 Delaware Kent County
#> 5 050 10 001 Delaware Kent County
#> 6 050 10 001 Delaware Kent County