I'm using python with spark to treat some data with accent words in portuguese.
Some examples of data comes are like this:
.. -- Água, 1234 ...
- -- https://www.example.com/page.html *****
I'm trying to remove anything that is not a word or number from the left or right of the string, getting clean results like this:
Água, 1234
https://www.example.com/page.html
The best I could do is this:
^[^\\p{N}\\p{L}]]|[^\\p{N}\\p{L}]$
But this didn't work. I saw a lot solutions but non matching the beginning and end of string with accent characters.
Thanks in advance.