TLDR: How can I json.loads with custom separator without replacing the separator with a comma?
I have a spark dataframe, that I want to write to CSV, and for that I need to jsonize every row in it.
So I have the following pyspark row:
Row(type='le', v=Row(occ=False, oov=False, v=True), x=966, y=340)
I want to make the row ready for CSV. If I write to CSV with normal json.dumps, I will get line with many commas, then the read csv method doesn't read the file (a lot more commas)
So, I perform json.dumps with separators=("| ", ": ")), and I get the string s:
'["le"| [false| false| true]| 966| 340]'
Now i'm able to do:
json.loads(s.replace('|',','))
And I receive the desired output:
['le', [False, False, True], 966, 340]
Now is the problematic part:
I write it to csv. When I read it, before trying to json.loads, I receive:
'[\\le\\"| [false| false| true]| 966| 340]"'
The desired output, is as before:
['le', [False, False, True], 966, 340]
But I can't reach it.
When I try to do json.loads, I get:
json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1)
When I try to change the '|' to ',':
s = s.replace('|',',')
s
Out: '[\\left_ear\\", [false, false, true], 966, 340]"'
json.loads(s)
json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1)
This post is a try to overcome a previous problem which I didn't find answer to: Convert multiple array of structs columns in pyspark sql
If I find a solution to this problem it will help me.
Bottom line this is the line I need to parse:
'[\\le\\"| [false| false| true]| 966| 340]"'
How can I do it?