prevent word split based on - in sentences

Question

This:

import re

title = 'Decreased glucose-6-phosphate dehydrogenase activity along with oxidative stress affects visual contrast sensitivity in alcoholics.'

words = list(filter(None, re.split('\W+', title)))
for word in words:
    print(word)

results in:

Decreased
glucose
6
phosphate
dehydrogenase
activity
along
with
oxidative
stress
affects
visual
contrast
sensitivity
in
alcoholics

Ideally, I would like to prevent the splitting of words like:

glucose-6-phosphate

Is there a better way to obtain separate words of a sentence like this in Python? Should I adopt the regular expression or is there something OOTB? Thanks.

Possible duplicate of [Split string on whitespace in Python](https://stackoverflow.com/questions/8113782/split-string-on-whitespace-in-python) — Pedro Lobito, Sep 04 '18 at 13:44
I do not think that this is answered in the link provided as str.split() does not work see comment to answer below! — cs0815, Sep 04 '18 at 13:55

score 1 · Answer 1 · answered Sep 04 '18 at 13:38

1

The \W+ means a sequence of characters (letters). Since - is not among those characters, the sentence is split there. Since you only seem to split at spaces, you don't need a regular expression, you can just title.split().

answered Sep 04 '18 at 13:38

blue_note

27,712
9
72
90

score 1 · Accepted Answer · answered Sep 04 '18 at 13:45

1

The pattern \W splits at this grouping: [^a-zA-Z0-9_] so to stop it splitting on hyphens simply add one to this pattern and use it in your regex:

words = list(filter(None, re.split('[^a-zA-Z0-9_-]+', title)))

answered Sep 04 '18 at 13:45

T Burgis

1,395
7
9

prevent word split based on - in sentences

2 Answers2