Adding leading zero once imported into R

Question

I have a data frame which includes a Reference column. This is a 10 digit number, which could start with zeros. When importing into R, the leading zeros disappear, which I would like to add back in.

I have tried using sprintf and formatC, but I have different problems with each.

DF=data.frame(Reference=c(102030405,2567894562,235648759), Data=c(10,20,30))

The outputs I get are the following:

> sprintf('%010d', DF$Reference)
[1] "0102030405" "        NA" "0235648759"
Warning message:
In sprintf("%010d", DF$Reference) : NAs introduced by coercion
> formatC(DF$Reference, width=10, flag="0")
[1] "001.02e+08" "02.568e+09" "02.356e+08"

The first output gives NA when the number already has 10 digits, and the second stores the result in standard form.

What I need is:

[1]  0102030405 2567894562  0235648759

I think your expected output is not reflecting with the leading zeros.. — akrun, Mar 07 '16 at 12:54
working through the examples in http://stackoverflow.com/questions/5812493/adding-leading-zeros-using-r, leads to `library(stringr); str_pad(DF$Reference, 10, pad = "0")` — user20650, Mar 07 '16 at 12:55
I just spotted that, and have edited the post. I haven't come acorss `str_pad` before, but it seems to be doing the trick. Thank you. — sym246, Mar 07 '16 at 12:57
http://stackoverflow.com/questions/14589354/struggling-with-integers-maximum-integer-size might explain results — user20650, Mar 07 '16 at 13:06

score 6 · Accepted Answer · edited May 23 '17 at 11:45

6

library(stringi)
DF = data.frame(Reference = c(102030405,2567894562,235648759), Data = c(10,20,30))
DF$Reference = stri_pad_left(DF$Reference, 10, "0")
DF
#    Reference Data
# 1 0102030405   10
# 2 2567894562   20
# 3 0235648759   30

Alternative solutions: Adding leading zeros using R.

When importing into R, the leading zeros disappear, which I would like to add back in.

Reading the column(s) in as characters would avoid this problem outright. You could use readr::read_csv() with the col_types argument.

edited May 23 '17 at 11:45

Community

1
1

answered Mar 07 '16 at 13:00

effel

1,421
1
9
17

1

Props for the real solution: read the file correctly in the first place. – Hong Ooi Mar 07 '16 at 13:51
2

Although `read.csv` with the `colClasses` argument works just as well as `read_csv` with `col_types`. – Hong Ooi Mar 07 '16 at 13:52
That's right, thanks for pointing to colClasses. (http://stackoverflow.com/questions/2805357/specifying-colclasses-in-the-read-csv) – effel Mar 07 '16 at 13:56

Paul Rougieux · Answer 2 · 2016-03-07T14:08:30.623

formatC

You can use

formatC(DF$Reference, digits = 0,  width = 10, format ="f", flag="0")
# [1] "0102030405" "2567894562" "0235648759"

sprintf

The use of d in sprintf means that your values are integers (or they have to be converted with as.integer()). help(integer) explains that:

"the range of representable integers is restricted to about +/-2*10^9: doubles can hold much larger integers exactly."

That is why as.integer(2567894562) returns NA.

Another work around would be to use a character format s in sprintf:

sprintf('%010s',DF$Reference)
# [1] " 102030405" "2567894562" " 235648759"

But this gives spaces instead of leading zeros. gsub() can add zeros back by replacing spaces with zeros:

gsub(" ","0",sprintf('%010s',DF$Reference))
# [1] "0102030405" "2567894562" "0235648759"

Adding leading zero once imported into R

2 Answers2

formatC

sprintf