0

I have data in the following form:

<a> <b> _:h1 <c>.
_:h1 <e> "200"^^<http://www.w3.org/2001/XMLSchema#integer> <f> .
_:h1 <date> "Mon, 30 Apr 2012 07:01:51 GMT" <p> .
_:h1 <server> "Apache/2" <df> .
_:h1 <last-modified> "Sun, 25 Mar 2012 14:15:37 GMT" <hf> .

I need to convert it into the following form using Python:

<a> <b> _:h1.
<1> <c>.
_:h1 <e> "200"^^<http://www.w3.org/2001/XMLSchema#integer> .
<1> <f>.
_:h1 <date> "Mon, 30 Apr 2012 07:01:51 GMT".
<1> <p>.
_:h1 <server> "Apache/2" .
<1> <df>.
_:h1 <last-modified> "Sun, 25 Mar 2012 14:15:37 GMT" .
<1> <hf>.

I wrote code in Python which using the str.split() method. It splits the string based on space. However, it does not solve my purpose as by using it "Sun, 25 Mar 2012 14:15:37 GMT" also gets split. Is there some other way to achieve this using Python?

dda
  • 6,030
  • 2
  • 25
  • 34
Jannat Arora
  • 2,759
  • 8
  • 44
  • 70

3 Answers3

2

You can use the rfind or rindex methods to find the last occurrence of < in your lines.

data = """[your data]"""
data_new = ""
for line in data.splitlines():
    i = line.rfind("<")
    data_new += line if i == -1 else line[:i] + ". \n<1> " + line[i:] + "\n"
data_new = data_new.strip()
tobias_k
  • 81,265
  • 12
  • 120
  • 179
0

Is that N3/Turtle? If so, I think you want RDFlib.

Also see: Reading a Turtle/N3 RDF File with Python

Community
  • 1
  • 1
Fredrik
  • 940
  • 4
  • 10
0

What is the problem with space inside strings ? It seems you are only interested in the last two fields, which will be there whatever number of chunk your line is splitted into.

fields = line.split()
count = len(fields)
tag = fields[count - 2]
dot = fields[count - 1]
# Now print your line without last two fields
l1 = " ".join(fields[0:count - 2])
l2 = '<1> ' + tag + dot

Well I don't know exactly what is supposed to be done with the end dot, but unless you have to keep your strings with the exact same amount of space, it should be ok.

shodanex
  • 14,975
  • 11
  • 57
  • 91
  • Not sure... note that sometimes there is a space before the final dot and sometimes there isn't. Also, I guess `[len - 2]` should be `[count - 2]`? – tobias_k Jul 22 '13 at 12:46
  • Well, you can remove the last field if it is a single dot but your solution seems better – shodanex Jul 22 '13 at 12:56