1

I'm having to use SparkR for some portion of a project, I typically use scala. I'm writing out a file using the following code

# Let's set the scipen
options(scipen=999)  
# create a spark dataframe and write out
sdf <- SparkR::as.DataFrame(df)   
SparkR::head(sdf) # all looks good
SparkR::write.json(sdf, path=somePath, mode="append") # does not look good

However, when I go to view the written out output one of my vars, timestamp in this case, is written out using scientific notation, e.g. 1.4262E12. When I would rather have it long, e.g. 1426256000000. I can't for some reason figure out why write.json is writing the file out this way. Before writing the file out I view my spark data frame and see timestamp written out long. Can anyone help/advise to work around this problem?

Here is an example of the schema, must be kept this way:

root
 |-- price: integer (nullable = true)
 |-- timestamp: double (nullable = true)
fletchr
  • 646
  • 2
  • 8
  • 25
  • 1
    This may help https://stackoverflow.com/questions/40206592/how-to-turn-off-scientific-notation-in-pyspark – David May 17 '18 at 17:55
  • 1
    You could also try casting the timestamp to a `long` before writing it out. – nate May 17 '18 at 18:01
  • Okay, long worked! sdf$timestamp<-SparkR::cast(sdf$timestamp, "long") Thankyou @nate – fletchr May 17 '18 at 18:07

1 Answers1

1

Thank you to @nate, this solved my problem and it works with the Schema I have to use anyways:

sdf$timestamp <- SparkR::cast(sdf$timestamp, "long")
fletchr
  • 646
  • 2
  • 8
  • 25