-1

This:

import re

title = 'Decreased glucose-6-phosphate dehydrogenase activity along with oxidative stress affects visual contrast sensitivity in alcoholics.'

words = list(filter(None, re.split('\W+', title)))
for word in words:
    print(word)

results in:

Decreased
glucose
6
phosphate
dehydrogenase
activity
along
with
oxidative
stress
affects
visual
contrast
sensitivity
in
alcoholics

Ideally, I would like to prevent the splitting of words like:

glucose-6-phosphate 

Is there a better way to obtain separate words of a sentence like this in Python? Should I adopt the regular expression or is there something OOTB? Thanks.

cs0815
  • 16,751
  • 45
  • 136
  • 299
  • Possible duplicate of [Split string on whitespace in Python](https://stackoverflow.com/questions/8113782/split-string-on-whitespace-in-python) – Pedro Lobito Sep 04 '18 at 13:44
  • I do not think that this is answered in the link provided as str.split() does not work see comment to answer below! – cs0815 Sep 04 '18 at 13:55

2 Answers2

1

The \W+ means a sequence of characters (letters). Since - is not among those characters, the sentence is split there. Since you only seem to split at spaces, you don't need a regular expression, you can just title.split().

blue_note
  • 27,712
  • 9
  • 72
  • 90
1

The pattern \W splits at this grouping: [^a-zA-Z0-9_] so to stop it splitting on hyphens simply add one to this pattern and use it in your regex:

words = list(filter(None, re.split('[^a-zA-Z0-9_-]+', title)))
T Burgis
  • 1,395
  • 7
  • 9