0

Good day all,

So I have been struggling with this for a couple of days.

I have a data.frame of about 20,000 rows and 18 columns (20000 X 18) of integer values, about half of which are negative values and some zeros.

i.e. if I do

sapply(fj, typeof)

I get -

      volc.X.x       volc.y.x       volc.X.y       volc.y.y     volc.X.x.x     volc.y.x.x     volc.X.y.y     volc.y.y.y   volc.X.x.x.x 
     "integer"      "integer"      "integer"      "integer"      "integer"      "integer"      "integer"      "integer"      "integer" 
  volc.y.x.x.x   volc.X.y.y.y   volc.y.y.y.y volc.X.x.x.x.x volc.y.x.x.x.x volc.X.y.y.y.y volc.y.y.y.y.y         volc.X         volc.y 
     "integer"      "integer"      "integer"      "integer"      "integer"      "integer"      "integer"      "integer"      "integer"

I know can do

fj_2<-as.data.frame(sapply(fj_1, as.double))

This, however, causes issues with how factors are dealt with, so taking clues from these posts -

How to convert a factor to integer\numeric without loss of information? & https://stackoverflow.com/a/2288510/2141709

I would have to do something like

as.double(as.character(fj$volc.X.x))

or in terms of sapply

fj_2<-as.data.frame(sapply(fj, as.character))
and then
fj_2<-as.data.frame(sapply(fj, as.double))

This way, however, the first sapply with as.character does not change my columns to character type, and as such when I run sapply with as.double - it converts all columns to double type, but the numbers are rounded off - removing all decimals, and negative values are converted to positive values.

I tried to run a "nested" sapply like this

fj_2<-as.data.frame(sapply((sapply(fj, as.character)), as.double))

but this just ends up returning a single column of 360,000 values, though the numbers are in their original format and not rounded out.

What changes do I need to make? I hope I was clear about my issue.

Thanks

Sid5427
  • 721
  • 3
  • 11
  • 19
  • For the most part, you should forget about `sapply()`. (Use `vapply()` if you really need the simplification part; it's safer to program with.) Instead, use `lapply()`, whose results will keep types and so can easily be coerced to a data frame provided all elements are of the same length. – alistaire Apr 13 '20 at 03:32
  • 2
    `factors` are stored as integers internally. You should use `class` i.e `sapply(fj, class)` to get the class of each column. Try `fj[] <- lapply(fj, function(x) as.numeric(as.character(x)))` if you convert factors to numeric. – Ronak Shah Apr 13 '20 at 03:32

0 Answers0