I am currently trying to split a string containing an entire text document by sentences so that I can convert it to a csv. Naturally, I would use periods as the delimiter and perform str.split('.')
, however, the document contains abbreviations 'i.e.' and 'e.g.' which I would want to ignore the periods in this case.
For example,
Original Sentence: During this time, it became apparent that vanilla shortest-path routing would be insufficient to handle the myriad operational, economic, and political factors involved in routing. ISPs began to modify routing configurations to support routing policies, i.e. goals held by the router’s owner that controlled which routes were chosen and which routes were propagated to neighbors.
Resulting List: ["During this time, it became apparent that vanilla shortest-path routing would be insufficient to handle the myriad operational, economic, and political factors involved in routing", "ISPs began to modify routing configurations to support routing policies, i.e. goals held by the router’s owner that controlled which routes were chosen and which routes were propagated to neighbors."]
My only workaround so far is replacing all 'i.e' and 'e.g.' with 'ie' and 'eg' which is both inefficient and grammatically undesirable. I am fiddling with Python's regex library which I suspect holds the answer I desire but my knowledge with it is novice at best.
It is my first time posting a question on here so I apologize if I am using incorrect format or wording.