1

I have a dataframe df_before which one of columns contains values such as:

id
123456789
1.11E+2
3.52E+4
5.60E+5
0001112345857RAE

and would like to convert them in df_after to:

id
123456789
111
35200
560000
0001112345857RAE

Basically I want to strip off the period . and replace any E+XX with 0's according to the number/ power of the exponent. This is what I have tried:

 df_after$id <- ifelse(str_detect(df_before$id, "E\\+\\d+$"),
                                gsub("E\\+\\d+",
                                     strrep("0", as.numeric(gsub(".*E\\+(\\d+)$", "\\1", df_before$id)) - 2),
                                     gsub("\\.", "", df_before$id)),
                                df_before$id)

Each smaller chunk of the above codes worked with 1 single input, for example this:

strrep("0", as.numeric(gsub(".*E\\+(\\d+)$", "\\1", "6.32E+3")))

results in:

"000" # which is as expected

also:

gsub("E\\+\\d+",
    strrep("0", as.numeric(gsub(".*E\\+(\\d+)$", "\\1", "6.32E+3")) - 2), 
    gsub("\\.", "", "6.32E+3"))

gives:

"6320" # as expected and desired

But when I applied it to the whole column using ifelse and str_detect (which also works as expected for those entries containing E+XX, it runs very slowly and returned NA's and some values like 6320NA000NA000NA000NA000....<truncated>

Could someone please assist me in fixing this block of code so it will work with the dataframe column?

Thank you so much!

billydh
  • 975
  • 11
  • 27

1 Answers1

2

We can use as.numeric to convert the numeric values while the non-numeric becomes NA. Using is.na, then we index and assign those values that are only numeric to the 'id' column

df_after <- df_before
v1 <- as.numeric(df_before$id)
i1 <- !is.na(v1)
df_after$id[i1] <- v1[i1]
df_after
#              id
#1        123456789
#2              111
#3            35200
#4           560000
#5 0001112345857RAE
akrun
  • 874,273
  • 37
  • 540
  • 662
  • hi @akrun i tried this and it did not work for larger number, e.g. `6.30E+11`, it returned `6.3e+11` instead, however for smaller number, e.g. `6.30E+4` it worked as you displayed above. is there a way to make it work for larger number as well? – billydh Nov 14 '17 at 00:27
  • 1
    i found an answer [here](https://stackoverflow.com/questions/5352099/how-to-disable-scientific-notation-in-r) - i just need to increase the digits options using `options(scipen = 999)`. thanks for your help! – billydh Nov 14 '17 at 00:36