I want to separate all the words from non words in Greek and Hebrew.
I'm using this code:
words = re.findall(r'\w+|\S+', text)
the result is not so satisfying, for example:
it separate ⸂ἡμῶν καὶ κυρίου⸃ -> (⸂ἡμῶν) (καὶ) (κυρίου) (⸃) which I want separated too (⸂) (ἡμῶν)
it doesn't separate ⸂ὑπὲρ⸃ to (⸂)ὑπὲρ(⸃)
it also doesn't separate [ὑμῖν] to ([) (ὑμῖν) (]) for Hebrew. It separate what is not suppose to be separated.