I just started PySpark, here is the task:
I have an input of:
I need to use a regex to remove punctuation and all leading or trailing space and underscore. output is all lowercase.
What I came up is not complete:
sentence = regexp_replace(trim(lower(column)), '\\*\s\W\s*\\*_', '')
and the result is:
How do I fix the regex here? I need to use regexp_replace here.
Thank you very much.