Split string to desired form using Python

Question

I have data in the following form:

<a> <b> _:h1 <c>.
_:h1 <e> "200"^^<http://www.w3.org/2001/XMLSchema#integer> <f> .
_:h1 <date> "Mon, 30 Apr 2012 07:01:51 GMT" <p> .
_:h1 <server> "Apache/2" <df> .
_:h1 <last-modified> "Sun, 25 Mar 2012 14:15:37 GMT" <hf> .

I need to convert it into the following form using Python:

<a> <b> _:h1.
<1> <c>.
_:h1 <e> "200"^^<http://www.w3.org/2001/XMLSchema#integer> .
<1> <f>.
_:h1 <date> "Mon, 30 Apr 2012 07:01:51 GMT".
<1> <p>.
_:h1 <server> "Apache/2" .
<1> <df>.
_:h1 <last-modified> "Sun, 25 Mar 2012 14:15:37 GMT" .
<1> <hf>.

I wrote code in Python which using the str.split() method. It splits the string based on space. However, it does not solve my purpose as by using it "Sun, 25 Mar 2012 14:15:37 GMT" also gets split. Is there some other way to achieve this using Python?

score 2 · Accepted Answer · answered Jul 22 '13 at 12:45

You can use the rfind or rindex methods to find the last occurrence of < in your lines.

data = """[your data]"""
data_new = ""
for line in data.splitlines():
    i = line.rfind("<")
    data_new += line if i == -1 else line[:i] + ". \n<1> " + line[i:] + "\n"
data_new = data_new.strip()

score 0 · Answer 2 · edited May 23 '17 at 11:57

0

Is that N3/Turtle? If so, I think you want RDFlib.

Also see: Reading a Turtle/N3 RDF File with Python

edited May 23 '17 at 11:57

Community

1
1

answered Jul 22 '13 at 09:38

Fredrik

940
4
10

shodanex · Answer 3 · 2013-07-22T12:52:50.557

0

What is the problem with space inside strings ? It seems you are only interested in the last two fields, which will be there whatever number of chunk your line is splitted into.

fields = line.split()
count = len(fields)
tag = fields[count - 2]
dot = fields[count - 1]
# Now print your line without last two fields
l1 = " ".join(fields[0:count - 2])
l2 = '<1> ' + tag + dot

Well I don't know exactly what is supposed to be done with the end dot, but unless you have to keep your strings with the exact same amount of space, it should be ok.

edited Jul 22 '13 at 12:52

answered Jul 22 '13 at 12:41

shodanex

14,975
11
57
91

Not sure... note that sometimes there is a space before the final dot and sometimes there isn't. Also, I guess `[len - 2]` should be `[count - 2]`? – tobias_k Jul 22 '13 at 12:46
Well, you can remove the last field if it is a single dot but your solution seems better – shodanex Jul 22 '13 at 12:56

Split string to desired form using Python

3 Answers3