0
>str(data$Installs)

$ Installs : Factor w/ 21 levels "","0+","1+","1,000+",..: 8 20 15 18 11 17 17 5 5 8 ...

 db$Installs = as.character(gsub("\\+", "", db$Installs))

 str(db$Installs)
  chr [1:10841] "10,000" "500,000" "5,000,000" "50,000,000" "100,000" "50,000" "50,000" "1,000,000" "1,000,000" "10,000" ...

 db$Installs = as.double(gsub(",","",db$Installs))

 str(db$Installs)
  num [1:10841] 1e+04 5e+05 5e+06 5e+07 1e+05 5e+04 5e+04 1e+06 1e+06 1e+04 ...

I want variables like this:

"10000" "500000" "5000000" "50000000" "100000" "50000" "50000" "1000000" "1000000" "10000" ...

I tried this code


db$Installs.factor <- factor(db$Installs) 
db$Installs = as.character(gsub("\\+", "", db$Installs))
db$Installs = as.double(gsub(",","",db$Installs))

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
Adarsh Pawar
  • 682
  • 6
  • 15
  • Try `as.numeric(gsub(",", "",db$Installs,fixed=TRUE))` rather than `double` – Rushabh Patel Apr 10 '19 at 19:40
  • Still its showing same `> str(db$Installs)` chr [1:10841] "10,000" "500,000" "5,000,000" "50,000,000" "100,000" "50,000" "50,000" "1,000,000" "1,000,000" "10,000" ... `> db$Installs = as.numeric(gsub(",", "",db$Installs,fixed=TRUE))` `> str(db$Installs)` num [1:10841] 1e+04 5e+05 5e+06 5e+07 1e+05 5e+04 5e+04 1e+06 1e+06 1e+04 ... I want variables like this: "10000" "500000" "5000000" "50000000" "100000" "50000" "50000" "1000000" "1000000" "10000" ... – Adarsh Pawar Apr 10 '19 at 19:49
  • provide some sample data – Rushabh Patel Apr 10 '19 at 19:50
  • For this `c <- c("10,000", "500,000" ,"5,000,000", "50,000,000" ,"100,000" ,"50,000" ,"50,000", "1,000,000" ,"1,000,000", "10,000")`, above solution works. – Rushabh Patel Apr 10 '19 at 19:52
  • And you are getting correct output (as per your `str` result) `1e+04` is `10000` – Rushabh Patel Apr 10 '19 at 19:53
  • No, your solution this one `as.numeric(gsub(",", "",db$Installs,fixed=TRUE))` giving me same output like this ` 1e+04 5e+05 5e+06 5e+07 1e+05 5e+04 5e+04 1e+06 1e+06 1e+04...` – Adarsh Pawar Apr 10 '19 at 19:57
  • I want variables like this: `"10000" "500000" "5000000" "50000000" "100000" "50000" "50000" "1000000" "1000000" "10000" ...` – Adarsh Pawar Apr 10 '19 at 19:57
  • Try this - `as.numeric(gsub("\\D", "", db$Intsalls))` – Rushabh Patel Apr 10 '19 at 19:59
  • check below example – Rushabh Patel Apr 10 '19 at 20:01

1 Answers1

1

Try this

Input-

sample <- c("10,000+" ,"500,000+", "5,000,000+", "50,000,000+" ,"100,000+", "50,000+" ,"50,000+" ,"1,000,000+" )

Solution-

sample <- as.numeric(gsub("\\D", "", sample))

Output-

1]    10000   500000  5000000 50000000   100000    50000    50000  1000000

Note- If you want to force R not to use exponential notation, then you can use -

options("scipen"=100, "digits"=4)

scipen’: integer. A penalty to be applied when deciding to print numeric values in fixed or exponential notation. Positive values bias towards fixed and negative towards scientific notation: fixed notation will be preferred unless it is more than ‘scipen’ digits wider.

Rushabh Patel
  • 2,672
  • 13
  • 34
  • 1
    The OP may still find things being printed in scientific notation, which is a separate issue, for which they might want to look [here](https://stackoverflow.com/q/9397664/324364). – joran Apr 10 '19 at 20:03
  • `> db <- read.csv("googleplaystore.csv")` `> str(db$Installs)` Factor w/ 21 levels "","0+","1+","1,000+",..: 8 20 15 18 11 17 17 5 5 8 ... `> db$Installs = as.numeric(gsub("\\D", "", db$Installs))` `> str(db$Installs)` num [1:10841] 1e+04 5e+05 5e+06 5e+07 1e+05 5e+04 5e+04 1e+06 1e+06 1e+04 ... – Adarsh Pawar Apr 10 '19 at 20:19
  • It is converting to numeric using above solution, now you need to force r to avoid exponential notation by using link provided by @joran or using `options("scipen"=100, "digits"=4)` – Rushabh Patel Apr 10 '19 at 20:22
  • yes! Done Thanks.....`options("scipen"=100, "digits"=4)` It worked. – Adarsh Pawar Apr 10 '19 at 20:25