0

I have a text document which has to splitted whenever \n or . appears. Using split() we can split, but based on both \n and . how it can be done.

Code

text = 'Christmas Perot 2021 TSO\nSkip to Main Content HOME CONCERTS EVENTS ABOUT STAFF EDUCATION SUPPORT US More Use tab to navigate through the menu items. BUY TICKETS SUNDAY, DECEMBER 12, 2021 I PEROT THEATRE I 4:00 PM\nPOPS I Christmas at The Perot\nCLICK HERE to purchase tickets, or contact the Texarkana Symphony Orchestra at 870.773.3401\nA Texarkana Tradition Join the TSO, the Texarkana Jazz Orchestra, and the TSO Chamber Singers, for this holiday concert for the whole family.\nDon’t miss seeing the winner of TSO’s 11th Annual Celebrity Conductor Competition\nBack to Events 2019 Texarkana Symphony Orchestra'
sentences = text.split('\n')
print(sentences)

Output

['Christmas Perot 2021 TSO',
 'Skip to Main Content HOME CONCERTS EVENTS ABOUT STAFF EDUCATION SUPPORT US More Use tab to navigate through the menu items. BUY TICKETS SUNDAY, DECEMBER 12, 2021 I PEROT THEATRE I 4:00 PM',
 'POPS I Christmas at The Perot',
 'CLICK HERE to purchase tickets, or contact the Texarkana Symphony Orchestra at 870.773.3401',
 'A Texarkana Tradition Join the TSO, the Texarkana Jazz Orchestra, and the TSO Chamber Singers, for this holiday concert for the whole family.',
 'Don’t miss seeing the winner of TSO’s 11th Annual Celebrity Conductor Competition',
 'Back to Events 2019 Texarkana Symphony Orchestra']

Desired Output

['Christmas Perot 2021 TSO',
 'Skip to Main Content HOME CONCERTS EVENTS ABOUT STAFF EDUCATION SUPPORT US More Use tab to navigate through the menu items.',
'BUY TICKETS SUNDAY, DECEMBER 12, 2021 I PEROT THEATRE I 4:00 PM',
 'POPS I Christmas at The Perot',
 'CLICK HERE to purchase tickets, or contact the Texarkana Symphony Orchestra at 870.773.3401',
 'A Texarkana Tradition Join the TSO, the Texarkana Jazz Orchestra, and the TSO Chamber Singers, for this holiday concert for the whole family.',
 'Don’t miss seeing the winner of TSO’s 11th Annual Celebrity Conductor Competition',
 'Back to Events 2019 Texarkana Symphony Orchestra']
  • You can replace all dots with newlines then do the split; alternately split by newlines, and for each element in the splits split by dot and collect the result in new array – sas1138 Oct 19 '21 at 10:26
  • 870.773.3401 have "." – Dani Mesejo Oct 19 '21 at 10:27
  • 2
    Does this answer your question? [Split Strings into words with multiple word boundary delimiters](https://stackoverflow.com/questions/1059559/split-strings-into-words-with-multiple-word-boundary-delimiters) – Phydeaux Oct 19 '21 at 10:29

1 Answers1

0

One way is this

text = 'Christmas Perot 2021 TSO\nSkip to Main Content HOME CONCERTS EVENTS ABOUT STAFF EDUCATION SUPPORT US More Use tab to navigate through the menu items. BUY TICKETS SUNDAY, DECEMBER 12, 2021 I PEROT THEATRE I 4:00 PM\nPOPS I Christmas at The Perot\nCLICK HERE to purchase tickets, or contact the Texarkana Symphony Orchestra at 870.773.3401\nA Texarkana Tradition Join the TSO, the Texarkana Jazz Orchestra, and the TSO Chamber Singers, for this holiday concert for the whole family.\nDon’t miss seeing the winner of TSO’s 11th Annual Celebrity Conductor Competition\nBack to Events 2019 Texarkana Symphony Orchestra'
semiSentences = text.replace('.','\n').split('\n')
sentences=[]
for s in semiSentences:
    if s.isalnum():
        sentences[-1]=sentences[-1]+'.'+s
    else:
        sentences.append(s)
        
        
print(sentences)

that outputs

['Christmas Perot 2021 TSO', 'Skip to Main Content HOME CONCERTS EVENTS ABOUT STAFF EDUCATION SUPPORT US More Use tab to navigate through the menu items', ' BUY TICKETS SUNDAY, DECEMBER 12, 2021 I PEROT THEATRE I 4:00 PM', 'POPS I Christmas at The Perot', 'CLICK HERE to purchase tickets, or contact the Texarkana Symphony Orchestra at 870.773.3401', 'A Texarkana Tradition Join the TSO, the Texarkana Jazz Orchestra, and the TSO Chamber Singers, for this holiday concert for the whole family', '', 'Don’t miss seeing the winner of TSO’s 11th Annual Celebrity Conductor Competition', 'Back to Events 2019 Texarkana Symphony Orchestra']