How to save a large dataframe and quickly load it in R?

Question

I'm currently working on a project to extract qualitative and quantitative (statistics) data about the Acadie portal in Wikipedia FR. There are 1905 entries to work with and 16 variables.

Every time I load all of the statistical data using the following code, it takes a bit of time to load. Is there a way to save this data.frame on my computer and load it again for future use quickly while keeping it organised?

# Basic information ----

library("WikipediR")

# Function
# How to make function outside of apply: https://ademos.people.uic.edu/Chapter4.html#:~:targetText=vapply%20is%20similar%20to%20sapply,VALUE).&targetText=VALUE%20is%20where%20you%20specify,single%20numeric%20value%2C%20so%20FUN.
pageInfo_fun <- function(portalAcadie_titles){
  page_info(language = "fr", 
            project = "wikipedia", 
            page = portalAcadie_titles,
            properties = c("url"),
            clean_response = T, Sys.sleep(0.0001))} # Syssleep to prevent quote violation.

pageInfo_data <- apply(portalAcadie_titles,1, pageInfo_fun)

# Transform into dataframe

library("tidyverse")
pageInfo_df <- data.frame(map_dfr(pageInfo_data, ~flatten(.)))

It gives me a workable dataframe that looks like this:

When I tried saving it to a csv and then using the ff package and read.csv.ffdf(), it didn't give me a workable dataframe. It consolidated all the variables and observations in one observation with 20 000 ish variables.

It seems like something you're using here requires a package that wasn't loaded in your syntax. — costebk08, Dec 09 '19 at 21:06

score 3 · Accepted Answer · answered Dec 09 '19 at 21:07

3

You can serialize it easily with:

readr::write_rds(pageInfo_df, "pageInfo_df.Rds")

and then deserialize it like so:

readr::read_rds("pageInfo_df.Rds")

this should handle every valid R object of an arbitrary complexity.

answered Dec 09 '19 at 21:07

Wojciech Kulma

6,186
3
18
27

2

fwiw these are merely wrappers on base R functions `saveRDS` and `readRDS` that work just fine; there's really no need for a package dependency if all you're doing is saving/reading rds files. – joran Dec 09 '19 at 21:13
1

That worked wonderfully! Thank you. – Judith Dec 09 '19 at 21:22

How to save a large dataframe and quickly load it in R?

1 Answers1