3

I'm trying to split a numeric string of 40 digits (ie. splitting 123456789123456789123456789 into 1 2 3 4 etc.)

Unfortunately strsplit doesn't work as it requires characters, and converting the string using as.character doesn't work as it is very long and R automatically cuts off decimals for long digits (maximum is 22 decimals). I thus end up with "1.2345e+35" as a character string, instead of the full digit.

Is there a numeric variant of strsplit, or a work around to the decimal-cutting-off issue? I can't seem to find the answer on stackoverflow, but apologies if this has already been answered before. Thanks in advance!

rvrvrv
  • 881
  • 3
  • 9
  • 29
  • what format are you starting with? character, or numeric? – Ben Bolker Jun 03 '12 at 15:29
  • it's numeric, that's why `strsplit` gives an error – rvrvrv Jun 03 '12 at 15:33
  • 2
    but if you're dealing with a very large numeric value, R has probably already lost precision. The max value of `options("digits")` is 22 ; I'm not sure of the maximum precision that R can hold in a `numeric` variable, but I think your value is larger than that. You may want to look at some of the SO answers on your options for arbitrary precision arithmetic (mostly involving interfaces to non-R tools such as `bc`): for example http://stackoverflow.com/questions/8175965/multiplication-of-large-integers-in-r – Ben Bolker Jun 03 '12 at 15:40
  • For example: `xc <- "123456789123456789123456789"; x <- as.numeric(xc); dump("x","")` – Ben Bolker Jun 03 '12 at 15:43
  • 2
    @BenBolker: s/probably/definitely. R's numeric class is double precision, which only gets you ~16 digits. Anything after that is rounding error. A reproducible example would really help in this case... OP: how is this number being created in R? – Joshua Ulrich Jun 03 '12 at 15:44
  • ... I just checked the `int64` package, and even unsigned 64-bit integers only get you 20 digits ... `library(int64); as.character(numeric_limits("uint64"))` – Ben Bolker Jun 03 '12 at 15:46
  • See also: http://rwiki.sciviews.org/doku.php?id=misc%3ar_accuracy%3ahigh_precision_arithmetic – Ben Bolker Jun 03 '12 at 15:49
  • If you can convert to character by manually placing the number in quotes will this work? a1 <- '1234567891234567891234567891234567891234' ; a2 <- strsplit(a1, "") ; a3 <- unlist(a2) ; a4 <- as.vector(as.numeric(a3)) ; – Mark Miller Jun 03 '12 at 22:04
  • I thought that `options("digits")` only specified the number of decimals that R shows, not the number that it maintains and uses for calculations? It is impossible to get from numeric to character as R apparently uses the `1.2345e+35` instead of the whole number, and thus that is what returns as a character. The data is retrieved from a MySQL database so I returned there and changed the category from `VARCHAR` to `CHAR` (but unfortunately had to redo all the data managing). My problem is thus solved, but the numeric-to-character-conversion using large numbers still remains an issue.. – rvrvrv Jun 04 '12 at 13:18
  • You're correct that `options('digits')` only controls printing. R does all calculations in double precision, which (as I said in an earlier comment) is limited to about 16 digits of precision. The `VARCHAR` column isn't the problem, since your DB is storing the value as a string. The problem is whatever method you used to pull the data into R was converting that field into a numeric. – Joshua Ulrich Jun 04 '12 at 17:32

3 Answers3

6

If R is calculating the number I do not know the solution. If the number is in a data file I think the code below might work. Although, if the number is in a data file there are probably much easier solutions.

a1 <- read.table("c:/users/Mark W Miller/simple R programs/long_number.txt", colClasses = 'character')

# a1 <- c('1234567891234567891234567891234567891234') ;

a1 <- as.character(a1) ;
a2 <- strsplit(a1, "") ;
a3 <- unlist(a2) ;
a4 <- as.vector(as.numeric(a3)) ;
a4
# [1] 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4

EDIT

I realize I might not understand the question, and my answer is probably pretty silly. Nevertheless, if you have an entire data set of really long numbers you could split all of them with the code below. Note that there are no quotes in the file 'three_long_numbers.txt', and the data start out as numeric:

a1 <- read.table("c:/users/Mark W Miller/simple R programs/three_long_numbers.txt", colClasses = 'character')
a1

#      V1                                        
# [1,] "1234567891234567891234567891234567891234"
# [2,] "1888678912345678912345678912345678912388"
# [3,] "1234999891234567891234567891234567891239"

# a1 <- matrix(c(
# "1234567891234567891234567891234567891234",
# "1888678912345678912345678912345678912388",
# "1234999891234567891234567891234567891239"), nrow=3, byrow=T)

a1 <- as.matrix(a1) ;
a2 <- strsplit(a1, "") ;
a3 <- unlist(a2) ;
a3 <- as.numeric(a3) ;
a4 <- matrix(a3, nrow=dim(a1)[1], byrow=T)
a4
Mark Miller
  • 12,483
  • 23
  • 78
  • 132
4

You could simply do this to split as numeric vector:

s <- "123456789123456789123456789"
as.numeric(strsplit(s,"")[[1]])

# [1] 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

or if you want them splitted as character vector:

strsplit(s,"")[[1]]

# [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "1" "2" "3" "4" "5" "6" "7" "8" 
# "9" "1" "2" "3" "4" "5" "6"
# [25] "7" "8" "9"
989
  • 12,579
  • 5
  • 31
  • 53
1

Here is another approach that seems more straight-forward than my answer from a year ago:

Split a single vector:

a1 <- c('1234567891234567891234567891234567891234')
a2 <- read.fwf(textConnection(a1), widths=rep(1, nchar(a1)), colClasses = 'numeric', header=FALSE)
a2
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 V35 V36 V37 V38 V39 V40
1  1  2  3  4  5  6  7  8  9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4

Read a file containing the following three long numbers of equal length:

# 1234567891234567891234567891234567891234
# 1888678912345678912345678912345678912388
# 1234999891234567891234567891234567891239

a1 <- read.table("c:/users/mmiller21/simple R programs/three_long_numbers.txt", colClasses = 'character', header = FALSE)
a2 <- read.fwf("c:/users/mmiller21/simple R programs/three_long_numbers.txt", widths=rep(1, max(nchar(a1$V1))), colClasses = 'numeric', header=FALSE)
a2

  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 V35 V36 V37 V38 V39 V40
1  1  2  3  4  5  6  7  8  9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4
2  1  8  8  8  6  7  8  9  1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   8   8
3  1  2  3  4  9  9  9  8  9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   9

Read a file containing the following three long numbers of unequal length:

# 1234567891234567891234567891234567891234
# 188867891234567891234567891234567891238
# 12349998912345678912345678912345678912

a1 <- read.table("c:/users/mmiller21/simple R programs/three_long_numbersb.txt", colClasses = 'character', header = FALSE)
a2 <- read.fwf("c:/users/mmiller21/simple R programs/three_long_numbersb.txt", widths=rep(1, max(nchar(a1$V1))), colClasses = 'numeric', header=FALSE)
a2

  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 V35 V36 V37 V38 V39 V40
1  1  2  3  4  5  6  7  8  9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4
2  1  8  8  8  6  7  8  9  1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   8  NA
3  1  2  3  4  9  9  9  8  9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2  NA  NA

Here is code to split one column of long numbers in a data file that contains multiple columns. In this example each number in column 2 has the same length:

# -10 1234567891234567891234567891234567891234 -100
# -20 1888678912345678912345678912345678912388 -200
# -30 1234999891234567891234567891234567891239 -300

a1 <- read.table("c:/users/mark w miller/simple R programs/three_long_numbers_Oct25_2013.txt", colClasses = c('numeric', 'character', 'numeric'), header = FALSE)
a2 <- read.fwf(textConnection(a1$V2), widths=rep(1, nchar(a1$V2)[1]), colClasses = 'numeric', header=FALSE)
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 V35 V36 V37 V38 V39 V40
1  1  2  3  4  5  6  7  8  9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4
2  1  8  8  8  6  7  8  9  1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   8   8
3  1  2  3  4  9  9  9  8  9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   4   5   6   7   8   9   1   2   3   9
Mark Miller
  • 12,483
  • 23
  • 78
  • 132