2

Can anyone direct me to how I can load large .ndjson files into R.

My aim is to explore Parler social media data - which can be found here (https://zenodo.org/record/4442460#.YCOB32j7TFj)

Initially, as the file is large I have downloaded the parler_users.zip file as it is significantly smaller than the parler_data.zip file. My aim was to quickly explore how i can load this data in R, but so far I have not yet been successful. Please note I am a beginner to R so i do not have much experience.

In success of being able to load the 1gb dataset, I will then pursue attempting loading the 32gb.

It would be greatly appreciated if someone can help me with this process.

Sean
  • 47
  • 6
  • 1
    How much RAM do you have? The whole point of ndjson (vs JSON) is that you *don’t* load the entire data at once, because it’s usually too large for that. Instead, you process data one record (or several records) at a time. – Konrad Rudolph Feb 11 '21 at 11:23
  • I only have 8gb RAM – Sean Feb 12 '21 at 11:28
  • See my answer below. BTW have you discerned any structure/pattern to the ordering of the 167 data files? They don't appear to be chronological. – jacanterbury Nov 04 '21 at 16:43

1 Answers1

3

I've had some joy using

library(jsonlite)

and using

jsonlite::stream_in(file('filename_here'), verbose=F)

You'll likely want to use the

handler=

parameter too and create a callback function to help process each record

you might also like to look at

library(tidytext)
Dharman
  • 30,962
  • 25
  • 85
  • 135
jacanterbury
  • 1,435
  • 1
  • 26
  • 36