I have a very large data frame in r with 4.3 million rows and 7 columns which I need to group and summarize in order to create some graphics (with ggplot) later on. But it seems that this isn't working, I waited for a couple hours, but nothing happend.
I have an example with a small data frame that works:
SDO_ID <- c(44, 44, 44, 44, 45, 45, 45, 45)
Value <- c(93,94,88,77,68,51,55,78)
zeitstempel <- c("2018-01-01 01:00:00", "2018-01-01 02:00:00", "2018-01-02 01:00:00", "2018-01-02 02:00:00")
time <- format(as.POSIXct(zeitstempel),format = "%H:%M:%S")
date <- as.Date(zeitstempel)
df <- data.frame(SDO_ID, Value, time, date)
rf_by_day <- df %>% group_by(date, SDO_ID) %>%
summarize(value = mean(Value))
This example works as expected. But with my original data frame (over 4.3 million rows) I don't get any result. Do I just have to wait longer? Or is there anything that I could do to improve the code?
Would it help if I first extract the months and then use the months for grouping instead of the exact date (since there will be fewer rows in the final data frame)? Or does this not make any difference, since the original data frame has 4.3 million rows anyways?
Thank you!