Questions tagged [gapply]
6 questions
3
votes
0 answers
Databricks - Dplyr on SparkDataframe
I am looking to run dplyr functions on a Spark dataframe.
How do I run dplyr functions on a Spark dataframe through Databricks? No matter how I modify my code it always has the same error with a different dplyr function.
HDEF_df_test is a Spark…

nak5120
- 4,089
- 4
- 35
- 94
2
votes
1 answer
SparkR gapply - function returns a multi-row R dataframe
Let's say I want to execute something as follows:
library(SparkR)
...
df = spark.read.parquet()
df.gapply(
df,
df$column1,
function(key, x) {
return(data.frame(x, newcol1=f1(x), newcol2=f2(x))
}
)
where the…

Matt Anthony
- 121
- 8
1
vote
1 answer
Bizdays doesn't exclude weekends
I am trying to calculate utilization rates by relative employee lifespans. I need to assign a total number of hours available to this employee between the earliest and furthest date in which time was recorded. From there I will use this as the…

Rory
- 95
- 1
- 5
1
vote
0 answers
gapply sometimes returning duplicated groups?
I'm running some code, the relevant essence of which is:
library(SparkR)
library(magrittr)
sqlContext %>% sql("select * from tmp") %>%
gapply("id", function(key, x) {
data.frame(
id = key,
n = nrow(x)
)
}, schema =…

MichaelChirico
- 33,841
- 14
- 113
- 198
0
votes
1 answer
Number of weekdays between two dates applied to groups in a grouped dataframe
I am trying to use gapply on a grouped df to get a timeline for time entry on projects.
Below I want to get a column that will have available working time for a person based on working hours between the earliest date they booked time and the latest…

Rory
- 95
- 1
- 5
0
votes
1 answer
declaring output schema when using gapply in Sparkr
I would like to use gapply according to https://spark.apache.org/docs/latest/sparkr.html#gapply
The problem is I am returning a list of 2 dataframes.
return(list(df1, df2))
How do I declare the output schema in this case?

bhomass
- 3,414
- 8
- 45
- 75