I have an input string, some of which does not contain actual words (for example, it contains mathematical formulas such as x^2 = y_2 + 4
). I would like a way to split my input string by whether we have a substring of actual English words. For example:
If my string was:
"Taking the derivative of: f(x) = \int_{0}^{1} z^3, we can see that we always get x^2 = y_2 + 4 which is the same as taking the double integral of g(x)"
then I would like it split into a list like:
["Taking the derivative of: ", "f(x) = \int_{0}^{1} z^3, ", "we can see that we always get ", "x^2 = y_2 + 4 ", "which is the same as taking the double integral of ", "g(x)"]
How can I accomplish this? I don't think regex will work for this, or at least I'm not aware of any method in regex that detects the longest substrings of English words (including commas, periods, semicolons, etc).