Questions tagged [sparkr]

SparkR is an R package that provides a light-weight frontend to use Apache Spark from R.

SparkR is a r package that provides a light-weight frontend to use apache-spark from R.

SparkR exposes the Spark API through the RDD class and allows users to interactively run jobs from the R shell on a cluster.

SparkR exposes the RDD API of Spark as distributed lists in R.

Related Packages:

References:

796 questions

votes

10 answers

How do I read a Parquet in R and convert it to an R DataFrame?

I'd like to process Apache Parquet files (in my case, generated in Spark) in the R programming language. Is an R reader available? Or is work being done on one? If not, what would be the most expedient way to get there? Note: There are Java and C++…

r apache-spark parquet sparkr

asked May 22 '15 at 17:05

metasim

4,793
3
46
70

votes

7 answers

SparkR vs sparklyr

Does someone have an overview with respect to advantages/disadvantages of SparkR vs sparklyr? Google does not yield any satisfactory results and both seem fairly similar. Trying both out, SparkR appears a lot more cumbersome, whereas sparklyr is…

r apache-spark sparkr sparklyr

asked Sep 14 '16 at 15:35

koVex

votes

4 answers

Installing of SparkR

I have the last version of R - 3.2.1. Now I want to install SparkR on R. After I execute: > install.packages("SparkR") I got back: Installing package into ‘/home/user/R/x86_64-pc-linux-gnu-library/3.2’ (as ‘lib’ is unspecified) Warning in…

r apache-spark sparkr

asked Jul 02 '15 at 12:38

Guforu

3,835
8
33
52

votes

3 answers

Difference between createOrReplaceTempView and registerTempTable

I am new to spark and was trying out a few commands in sparkSql using python when I came across these two commands: createOrReplaceTempView() and registerTempTable(). What is the difference between the two commands?. They seem to have same set of…

apache-spark pyspark apache-spark-sql sparkr

asked Jul 17 '17 at 13:41

Amogh Huilgol

1,252
3
18
25

votes

6 answers

Summing multiple columns in Spark

How can I sum multiple columns in Spark? For example, in SparkR the following code works to get the sum of one column, but if I try to get the sum of both columns in df, I get an error. # Create SparkDataFrame df <- createDataFrame(faithful) # Use…

apache-spark pyspark sparkr

asked Jun 12 '17 at 14:35

Gaurav Bansal

5,221
14
45
91

votes

7 answers

Unable to launch SparkR in RStudio

After long and difficult installation process of SparkR i getting into new problems of launching SparkR. My Settings R 3.2.0 RStudio 0.98.1103 Rtools 3.3 Spark 1.4.0 Java Version 8 SparkR 1.4.0 Windows 7 SP 1 64 Bit Now i try to use…

r windows apache-spark rstudio sparkr

asked Jun 29 '15 at 15:05

Patrick C.

2,221
1
11
15

votes

2 answers

Add column to DataFrame in sparkR

I would like to add a column filled with a character N in a DataFrame in SparkR. I would do it like that with non-SparkR code : df$new_column <- "N" But with SparkR, I get the following error : Error: class(value) == "Column" || is.null(value) is…

r sparkr

asked May 19 '16 at 15:22

François M.

4,027
11
30
81

votes

1 answer

Using SparkR JVM to call methods from a Scala jar file

I wanted to be able to package DataFrames in a Scala jar file and access them in R. The end goal is to create a way to access specific and often-used database tables in Python, R, and Scala without writing a different library for each. To do this,…

r scala apache-spark apache-spark-sql sparkr

asked Oct 23 '15 at 20:55

mfliu

votes

1 answer

Using SparkR and Sparklyr simultaneously

As far as I understood, those two packages provide similar but mostly different wrapper functions for Apache Spark. Sparklyr is newer and still needs to grow in the scope of functionality. I therefore think that one currently needs to use both…

r apache-spark sparkr sparklyr

asked Nov 13 '16 at 19:02

CodingButStillAlive

votes

4 answers

Duplicate columns in Spark Dataframe

I have a 10GB csv file in hadoop cluster with duplicate columns. I try to analyse it in SparkR so I use spark-csv package to parse it as DataFrame: df <- read.df( sqlContext, FILE_PATH, source = "com.databricks.spark.csv", header =…

r csv hadoop apache-spark sparkr

asked Nov 19 '15 at 23:45

Bamqf

3,382
8
33
47

votes

2 answers

How to handle null entries in SparkR

I have a SparkSQL DataFrame. Some entries in this data are empty but they don't behave like NULL or NA. How could I remove them? Any ideas? In R I can easily remove them but in sparkR it say that there is a problem with the S4 system/methods.…

r apache-spark sparkr apache-spark-1.4

asked Jul 23 '15 at 21:46

Ole Petersen

votes

2 answers

How to call Sagemaker training model endpoint API in C#

I have implemented machine learning algorithms through sagemaker. I have installed SDK for .net, and tried by executing below code. Uri sagemakerEndPointURI = new…

c# amazon-web-services amazon-s3 sparkr amazon-sagemaker

asked Jan 21 '18 at 10:37

Diboliya

1,124
3
15
38

votes

2 answers

Why is collect in SparkR so slow?

I have a 500K row spark DataFrame that lives in a parquet file. I'm using spark 2.0.0 and the SparkR package inside Spark (RStudio and R 3.3.1), all running on a local machine with 4 cores and 8gb of RAM. To facilitate construction of a dataset I…

r apache-spark sparkr

asked Sep 19 '16 at 15:23

Wil Van Cleve

votes

1 answer

zeppelin with sparkr is not displaying dataframe as table

The zeppelin R interpreter documentation states: If you return a data.frame, Zeppelin will attempt to display it using Zeppelin's built-in visualizations. This can be seen in the documentation example: However, when I attempt to run the same R…

sparkr apache-zeppelin

asked Aug 05 '16 at 00:14

Chris Snow

23,813
35
144
309

votes

3 answers

Convert date to end of month in Spark

I have a Spark DataFrame as shown below: #Create DataFrame df <- data.frame(name = c("Thomas", "William", "Bill", "John"), dates = c('2017-01-05', '2017-02-23', '2017-03-16', '2017-04-08')) df <- createDataFrame(df) #Make sure df$dates…

pyspark apache-spark-sql sparkr

asked Jun 21 '17 at 21:38

Gaurav Bansal

5,221
14
45
91

2 3

…

53 54 Next