I have a dataframe that I need to write to disk but pyspark doesn't allow any of these characters ,;{}()\\n\\t=
to be present in the headers while writing as a parquet file.
So I wrote a simple script to detect if this is happening
import re
for each_header in all_headers:
print(re.match(",;{}()\\n\\t= ", each_header))
But for each header, None
was printed. This is wrong because I know my file has spaces in its headers.
So, I decided to check it out by executing the following couple of lines
a = re.match(",;{}()\\n\\t= ", 'a s')
print(a)
a = re.search(",;{}()\\n\\t= ", 'a s')
print(a)
This too resulted in None
getting printed.
I am not sure what I am doing wrong here.
PS: I am using python3.7