I am very new to spark. I am trying to play with rdds. So here is my basic rdd
rdd=sc.parallelize(['"ab,cd",9', 'xyz,6'])
Now if I want to split it on commas I do
rdd.map(lambda x:x.split(",")).collect()
which gives me
[['ab', 'cd', '9'], ['xyz', '6']]
Since I want to ignore the commas in between the text placed in "", I write
rdd.map(lambda x:x.split(",(?=([^\\\"]*\\\"[^\\\"]*\\\")*[^\\\"]*$)")).collect()
which gives the output
[['ab,cd,9'], ['xyz,6']]
(Thus this is not a duplicate question)
But I want the output similar to what I get with .split(",")
like so
[['ab,cd','9'], ['xyz','6']]
I am not very good with regex and so I do not know how to manipulate it to get that output. Any help will be greatly appreciated