0

Scenario: I just got my hands on a huge ntriples file (6.5gb uncompressed). I am trying to open it and perform some operations (such as cleaning some of the data that it contains).

Issue: I haven't been able to check the contents of this file. Notepad++ cannot handle it, and in RDFlib, the far as I got was to load the file, but I cannot seem to find a way to edit without parsing the entire thing. I also tried using RDF package (from how to parse big datasets using RDFLib?), but I cannot find a way to install it in Python 3.

Question: What is the best option to perform this kind of operation? Is there any command in rdflib that allows for this kind of editing?

DGMS89
  • 1,507
  • 6
  • 29
  • 60

1 Answers1

0

if it's ntriples then basically it's a line-by-line triples. Therefore, you can read the file by small chunks (some N lines from the file) and parse the chunk via rdflib followed by any cleaning operation you need on the graph.

Peb
  • 151
  • 3