p1 = re.compile(r"https?:[^\s]+[a-zA-Z0-9]")
p2 = re.compile("([\u4E00-\u9FD5a-zA-Z0-9+#&\._%\-]+)", re.U)
I would like to consolidate these two patterns into one and then I can use the 'split' function to split text based on the unified regular expressions. How to do that? Is there kind of pattern union operation, such as:
p = p1 + p2
p1 is a pattern to match URL string, and p2 is a pattern to split text into blocks based on some characters. I want to get a new pattern that match either p1 or p2. This is in Python.
Illustrate with examples:
text = This is a https://www.stackoverflow.com/posts/32244/edits example.
If I just apply p2, the text will be split into:
['This', ' ', 'is', ' ', 'a', ' ','https', '://', 'www.stackoverflow.com', '/', 'posts', '/', '32244', '/', 'edits', 'example']
I don't want to split the URL and I want to get these chunks:
['This',' ', 'is', ' ', 'a', ' ', 'https://www.stackoverflow.com/posts/32244/edits', ' ', 'example', '.']
That's why I want to add p1 for the URL keeping pattern. My description above with p = p1 + p2 may not be accurate.