So, I have a dataset df 1.4 GBs big, and I am trying to reshape it using the following function:
reshaped <- function(df){
df %>%
select(subject_num, concept_code) %>%
group_by(subject_num, concept_code) %>%
count() %>%
spread(concept_code, n, fill=0)
return(df)
}
df = read_rds('df.RDs') %>%
mutate(a=paste(a, b, sep="|"))
df <- reshaped(df)
write_rds(df, 'df_reshaped.RDs')
I get: Error: cannot allocate vector of size 1205.6 GB
. While debugging I discovered that the code gets stuck at the spread statement inside the reshaped function. I don't see how a dataset of 1.4 GB could ask for 1205.6 GB of memory inside the dplyr code that I wrote. Nothing in the code above seems like duplicating this dataset about 900 times as well, so I am a bit stuck here. Could anyone suggest why is this happening?