In Python I'm trying to download every single URL which is contained in a 180 MB JSON file. Even though it is only 180 MB, when I'm trying to open it with text-editor it uses 5.9 GB memory.
So Jupyter is crashing when I try to read the JSON and extract the URL's inside.
Here is a sample from JSON file.
{"company name": "ZERO CORP", "cik_number": "109284", "form_id": "10-K", "date": "19940629", "file_url": "https://www.sec.gov/Archives/data/109284/0000898430-94-000468.txt"}
{"company name": "FOREST LABORATORIES INC", "cik_number": "109563", "form_id": "10-K", "date": "19940628", "file_url": "https://www.sec.gov/Archives/data/38074/0000038074-94-000021.txt"}
{"company name": "GOULDS PUMPS INC", "cik_number": "14637", "form_id": "10-K", "date": "19940331", "file_url": "https://www.sec.gov/Archives/data/42791/0000042791-94-000002.txt"}
{"company name": "GENERAL HOST CORP", "cik_number": "275605", "form_id": "10-Q", "date": "19940701", "file_url": "https://www.sec.gov/Archives/data/40638/0000950124-94-001209.txt"}
Solutions that I think might work:
1) I think I'm going to need some kind of memory management to iterate over all the file_url's and download them in Python.
2) I can which to JavaScript and use node.js to do this iteration asyc but I never used JavaScript or node.js before.