3

I know that sparklyr has the following read file methods:

  • spark_read_csv
  • spark_read_parquet
  • spark_read_json

What about reading orc files? Is it supported yet by this library?

I know I can use read.orc in SparkR or this solution, but I'd like to keep my code in sparklyr.

MultiplyByZer0
  • 6,302
  • 3
  • 32
  • 48
michalrudko
  • 1,432
  • 2
  • 16
  • 30

1 Answers1

5

You can use low level Spark API in the same way I described in my answer to Transfer data from database to Spark using sparklyr:

library(dplyr)
library(sparklyr)

sc <- spark_connect(...)

spark_session(sc) %>% 
  invoke("read") %>% 
  invoke("format", "orc") %>%
  invoke("load", path) %>% 
  invoke("createOrReplaceTempView", name)

df <- tbl(sc, name)

where name is an arbitrary name used to identify the table

In the current sparklyr version you should be able to replace above with spark_read_source:

spark_read_source(sc, name, source = "orc", options = list(path = path))
zero323
  • 322,348
  • 103
  • 959
  • 935