I am trying to run Python Twarc hydrate on a very large file of 2,339,076 records but it keeps freezing. I have tried the script on a smaller data set and it works fine. My question is, does Twarc have a maximum number of rows it can process? If so what is it? Do I need to separate my data in to smaller subsections?
I have tried the terminal command:
twarc2 hydrate 2020-03-22_clean-dataset_csv.csv > hydrated.jsonl
I have tried it on a smaller file and it works fine
I have tried searching to find whether the is a limit to the number of rows Twarc can process but I can't find an answer.