1

I was trying to figure out how to split and save a text into different sentences in python based on various periods, like , . ? !. But some text has decimal points and re.split considers that as a period. I was wondering how I can get around that? any help would be appreciated!

Eg text:

A 0.75-in-diameter steel tension rod is 4.8 ft long and carries a load of 13.5 kip. Find the tensile stress, the total deformation, the unit strains, and the change in the rod diameter.

Abhi_J
  • 2,061
  • 1
  • 4
  • 16
Asit Singh
  • 17
  • 2
  • 2
    Checkout [How can I split a text into sentences?](https://stackoverflow.com/questions/4576077/how-can-i-split-a-text-into-sentences) and [using regualr expression as tokenizer](https://stackoverflow.com/questions/63870746/using-regular-expression-as-a-tokenizer/63871635#63871635) – DarrylG Jun 13 '21 at 19:32
  • @DarrylG it doesn't split on `,` though. – Abhi_J Jun 13 '21 at 19:37
  • @Abhi_J--correct--but confused why the poster includes commas when trying to split text into sentences. – DarrylG Jun 13 '21 at 19:54

1 Answers1

4

This will depend on your input, but if you can assume that ever period that you want to split at is followed by a space, then you can simply do:

>>> s = 'A 0.75-in-diameter steel tension rod is 4.8 ft long and carries a load of 13.5 kip. Find the tensile stress, the total deformation, the unit strains, and the change in the rod diameter.'
>>> s.split('. ')
['A 0.75-in-diameter steel tension rod is 4.8 ft long and carries a load of 13.5 kip', 'Find the tensile stress, the total deformation, the unit strains, and the change in the rod diameter.']

For anything more complicated than that, you'll probably want to use a regex like so:

import re
re.split(r'[\.!?]\s', s)
Will Da Silva
  • 6,386
  • 2
  • 27
  • 52
  • Oh cool! Thanks. Also so what if I wanna do that same thing for other end of sentences, like ? and !. I was wondering if there is a way to do so in one part or would I have to do the s.split for each one of them separately? – Asit Singh Jun 13 '21 at 19:39
  • 1
    You'll probably want to use a regex for anything more complicated than what I posted in my answer. I'll edit my answer to include the use of a regex for this. – Will Da Silva Jun 13 '21 at 19:41