Consider I have the below dataframe
AccountId,CloseDate
1,2015-05-07
2,2015-05-09
3,2015-05-01
4,2015-05-07
1,2015-05-09
1,2015-05-12
2,2015-05-12
3,2015-05-01
3,2015-05-01
3,2015-05-02
4,2015-05-17
1,2015-05-12
I want to group it based on AccountId and then I want to add another column naming date_diff which will contain the difference in CloseDate between the current row and previous row. Please note that I want this date_diff to be calculated only for rows having same AccountId. So I need to group the data before adding another column
Below is the R code that I am using
df <- read.df(sqlContext, "/home/ubuntu/work/csv/sample.csv", source = "com.databricks.spark.csv", inferSchema = "true", header="true")
df$CloseDate <- to_date(df$CloseDate)
groupedData <- SparkR::group_by(df, df$AccountId)
SparkR::mutate(groupedData, DiffCloseDt = as.numeric(SparkR::datediff((CloseDate),(SparkR::lag(CloseDate,1)))))
To add another column I am using mutate. But as the group_by returns groupedData I am not able to use mutate here. I am getting the below error
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘mutate’ for signature ‘"GroupedData"’
So how can I convert GroupedData into Dataframe so that I can add columns using mutate?