-1

I have a text file and I would like to get the string between two markers.

*rdfs:label "Henry Dunant"@de , "Henry Dunant"@en , "Henri Dunant"@fr ;*

This is the piece of text, I would like to get only the string "Henry Dunant"@en so I get Henry Dunant

So everything between " and "@en

iknow
  • 8,358
  • 12
  • 41
  • 68
pcrace97
  • 19
  • 3
  • 2
    Try this one: "([\w\s]+)"@en – marianc Aug 01 '20 at 12:27
  • 1
    You're probably looking for python's re module [here](https://docs.python.org/3/library/re.html#re.findall) – ChewySalmon Aug 01 '20 at 12:28
  • The string being parsed here is RDF in the Turtle format, so you should use a Turtle parser. Python's RDFLib (https://pypi.org/project/rdflib/) will parse this string, assuming it's part of a larger, valid Turtle string, into an in-memory graph which can then be queried via its API. g.parse(turtle_string, format="turtle) – Nicholas Car Jul 04 '22 at 08:24

4 Answers4

1

you can get required data using regular expression as below

import re

source = '*rdfs:label         "Henry Dunant"@de , "Henry Dunant"@en , "Henri Dunant"@fr ;*'
match = re.search(r'"[\w ]+"@en', source).group()
print(match)

for more information on regular expression in python, refer re documentation

karsas
  • 1,231
  • 4
  • 12
  • 24
0

If you only want to get one word, you can try the following code:

str_text = "rdfs:label         \"Henry Dunant\"@de , \"Henry Dunant\"@en , \"Henri Dunant\"@fr ;"
splitted_text = str_text.split("\"")
word = ""
for ind, fragment in enumerate(splitted_text):
    if fragment[:3]=="@en":
        word=splitted_text[ind-1]
print(word)

Result:

Henry Dunant
David Duran
  • 1,786
  • 1
  • 25
  • 36
0

I would suggest you to read the text file and then split it using (',') into a list. you can use loops to iterate through the elements

make another list to hold the new extracted elements.

extracted= []
for rawstring in list:
    for character in rawstring:
        if character == 'firstmarker':
            index1 = rawstring.index(character)
        elif character == 'secondmarker':
            index2 = rawstring.index(character) 
    extracted.append(rawstring[index1+1:index2])

You will now have everything in extracted list put your two markers instead of 'firstmarker' and 'secondmarker'

0

I implemented it in the simplest way :

string = '*rdfs:label         \"Henry Dunant\"@de , \"Henry Dunant\"@en , \"Henri Dunant\"@fr ;*'
res = string.split('\"')
for i in range(len(res)) :
    if res[i] == '@en , ':
        print(res[i-1])
mo1ein
  • 535
  • 4
  • 18