Getting string between to markers in python

Question

I have a text file and I would like to get the string between two markers.

*rdfs:label "Henry Dunant"@de , "Henry Dunant"@en , "Henri Dunant"@fr ;*

This is the piece of text, I would like to get only the string "Henry Dunant"@en so I get Henry Dunant

So everything between " and "@en

You're probably looking for python's re module [here](https://docs.python.org/3/library/re.html#re.findall) — ChewySalmon, Aug 01 '20 at 12:28
The string being parsed here is RDF in the Turtle format, so you should use a Turtle parser. Python's RDFLib (https://pypi.org/project/rdflib/) will parse this string, assuming it's part of a larger, valid Turtle string, into an in-memory graph which can then be queried via its API. g.parse(turtle_string, format="turtle) — Nicholas Car, Jul 04 '22 at 08:24

score 1 · Answer 1 · answered Aug 01 '20 at 13:05

you can get required data using regular expression as below

import re

source = '*rdfs:label         "Henry Dunant"@de , "Henry Dunant"@en , "Henri Dunant"@fr ;*'
match = re.search(r'"[\w ]+"@en', source).group()
print(match)

for more information on regular expression in python, refer re documentation

score 0 · Accepted Answer · answered Aug 01 '20 at 12:29

If you only want to get one word, you can try the following code:

str_text = "rdfs:label         \"Henry Dunant\"@de , \"Henry Dunant\"@en , \"Henri Dunant\"@fr ;"
splitted_text = str_text.split("\"")
word = ""
for ind, fragment in enumerate(splitted_text):
    if fragment[:3]=="@en":
        word=splitted_text[ind-1]
print(word)

Result:

Henry Dunant

score 0 · Answer 3 · answered Aug 01 '20 at 12:43

I would suggest you to read the text file and then split it using (',') into a list. you can use loops to iterate through the elements

make another list to hold the new extracted elements.

extracted= []
for rawstring in list:
    for character in rawstring:
        if character == 'firstmarker':
            index1 = rawstring.index(character)
        elif character == 'secondmarker':
            index2 = rawstring.index(character) 
    extracted.append(rawstring[index1+1:index2])

You will now have everything in extracted list put your two markers instead of 'firstmarker' and 'secondmarker'

score 0 · Answer 4 · answered Aug 01 '20 at 13:19

0

I implemented it in the simplest way :

string = '*rdfs:label         \"Henry Dunant\"@de , \"Henry Dunant\"@en , \"Henri Dunant\"@fr ;*'
res = string.split('\"')
for i in range(len(res)) :
    if res[i] == '@en , ':
        print(res[i-1])

answered Aug 01 '20 at 13:19

mo1ein

535
4
18

Getting string between to markers in python

4 Answers4