I looked quite a bit on stack overflow for an answer and nothing pops out. It's still not obvious after reading the link provided but I understand. Perhaps saving this post helps future people who think like I do.
I have reduced my 3.7 vs 2.7 issue down to a very simple code snippet:
import re
myStr = "Mary had a little lamb.\n"
reg_exp = re.compile('[ \\n\\r]*')
reg_exp.split(myStr)
['', 'M', 'a', 'r', 'y', '', 'h', 'a', 'd', '', 'a', '', 'l', 'i', 't', 't', 'l', 'e', '', 'l', 'a', 'm', 'b', '.', '', '']
In python 2.7 I get full word tokens. I would like to modify the compile line to be greedy * without splitting on characters.
If I don't include the greedy * I get extra spaces.
reg_exp = re.compile('[ \\n\\r]')
reg_exp.split(myStr)
['Mary', '', 'had', 'a', 'little', 'lamb.', '']
I would like to have my cake and eat it too! This is what I want:
['Mary', 'had', 'a', 'little', 'lamb.']
I've tried all sorts of things like various flags. I'm missing something very basic. Can you help? Thanks!