I am trying to structure my text document in an xml structure, where each sentence gets an id. I have text documents with unstructured sentences and I would like to split the sentences using a '.' delimiter and write them to xml. Here is my code:
import re
#Read the file
with open ('C:\\Users\\ngwak\\Documents\\test.txt') as f:
content = [f]
split_content = []
for element in content:
split_content += re.split("(.)\s+", element)
print(split_content, sep='\n\n')
But I am getting this error already and I cant interpret it:
TypeError: expected string or buffer
How can I split my sentences and write them to xml? Thanks a lot. This is how my txt file looks like:
In a formal sense, the germ of national consciousness can be traced back to the Peace Treaty of Hoachanas signed in 13–June-1858 between soldiers, all the chiefs except those of the Bondelswarts (who had not been involved in the previous fighting), as well as by Muewuta, two sons of amuaha, formerly a Commandant of Chief Onag of the Triku people. There is ample epistolary as well as oral evidence for this view. The most poignant statement is to be found in the now famous and oft-quoted letter of Onag to Bonagha written on May 13, 1890 in which, amongst other things, he says that on June 13 there are people coming. Again on the 01.02.2015 till the 01.05 there are some coming.
And I would like the sentences to be like this in xml:
<sentence id=01>In a formal sense, the germ of national consciousness
can be traced back to the Peace Treaty of Hoachanas signed in 13–June-
1858 between soldiers, all the chiefs except those of the Bondelswarts
(who had not been involved in the previous fighting), as well as by
Muewuta, two sons of amuaha, formerly a Commandant of Chief Onag of the
Triku people. </sentence>