0

I want to annotate a couple of XML-Files with the German STW Thesaurus for Economics. You can get the files here as ZIP-Archives in RDF/XML, N3 and Turtle (~14MB each).

So I wrote a Python-Script that deletes Stopwords, lemmatizes and does Part-of-Speech-Tagging. Now I want to check if a noun in one of the XML-Files is in the STW-Ontology. If yes, I'd like to do different options for a later to be done Automated Classification:

  • If it is an skos:altLabel Word, replacing it with the skos:prefLabel Word
  • Do nothing with the text, but add the skos:prefLabels at the end of the file with a count of the appearances of the skos:prefLabel and the associated skos:altLabels
  • Using e.g. skos:broader to find e.g. the Economic sectors or the Commodities related to the skos:prefLabel.

I know GATE and Apolda, which are able to do this, but they're Java-based and I'd like to do everything from one Python-Script at the end.

Are there any suggestions?

Niklas
  • 85
  • 1
  • 8

1 Answers1

0

I don't know if it's exactly what you are looking for but for working with RDF you have RDFLib.

You can get more guidance in the tools/libraries pointed in this answer or here.

Hope this can help! :)

Community
  • 1
  • 1
jlnabais
  • 829
  • 1
  • 6
  • 18