0

I'm having trouble with a data conversion. I have this data that I get from a .csv file, for instance:

comisiones[2850,28:30]

         Periodo.Pago     Monto.Pago.Credito     Disposicion.En.Efectivo

 2850          Mensual          11,503.68              102,713.20

The field Monto.Pago.Credito has a Factor data class and I need it to be numeric but the double precision kind. I need the decimals.

str(comisiones$Monto.Pago.Credito)

Factor w/ 3205 levels "1,000.00","1,000.01",..: 2476 2197 1373 1905 1348 3002 1252 95 2648 667 ...

So I use the generic data conversion function as.numeric():

comisiones$Monto.Pago.Credito <- as.numeric(comisiones$Monto.Pago.Credito)

But then the observation changes to this:

comisiones[2850,28:30]

       Periodo.Pago     Monto.Pago.Credito       Disposicion.En.Efectivo

  2850      Mensual                796              102,713.20


str(comisiones$Monto.Pago.Credito)
num [1:5021] 2476 2197 1373 1905 1348 ...

The max of comisiones$Monto.Pago.Credito should be 11,504.68 but now it is 3205.

I don't know if there is a specific data class or type for the decimals in R, I've looked for it but, it didn´t work.

Sam Firke
  • 21,571
  • 9
  • 87
  • 105

2 Answers2

3

You need to clean up your column firstly, like remove the comma, convert it to character then to numeric:

comisiones$Monto.Pago.Credito <- as.numeric(gsub(",", "", comisiones$Monto.Pago.Credito))

The problem shows up when you convert a factor variable directly to numeric.

Psidom
  • 209,562
  • 33
  • 339
  • 356
  • 2
    StackOverflow deprecates [using comments to say "thank you"](http://meta.stackoverflow.com/questions/258004/should-thank-you-comments-be-flagged?lq=1); if this answer was useful you can upvote it (if you have sufficient reputation), and in any case if it answers your question satisfactorily you are encouraged to click the check-mark to accept it. – Ben Bolker Jun 14 '16 at 16:37
2

You can use extract_numeric from the tidyr package - it will handle factor inputs and remove commas, dollar signs, etc.

library(tidyr)
comisiones$Monto.Pago.Credito <- extract_numeric(comisiones$Monto.Pago.Credito)

If the resulting numbers are large, they may not print with decimal places when you view them, whether you used as.numeric or extract_numeric (which itself calls as.numeric). But the precision is still being stored. For instance:

> x <- extract_numeric("1,200,000.3444")
> x
[1] 1200000

Verify that precision is still stored:

> format(x, nsmall = 4)
[1] "1200000.3444"
> x > 1200000.3
[1] TRUE
Sam Firke
  • 21,571
  • 9
  • 87
  • 105