pyspark convert string array to Map()

Question

I have a string as below in a text file:

ar.txt has 'K1:v1,K2:v2, K3:v3'

I have read this into an RDD and trying to convert it into MapType(StringType(), StringType()). When I try below it gives error with nulltype.

# Say data is in rdd called ar_rdd

ar_rdd1 = ar_rdd.map(lambda x: create_map(x.encode("ascii","ignore").split(",")) ))

Please suggest how to convert into a MapType() column ?

Provide more info including code you used for reading, preview of data and full error. — mayank agrawal, Sep 25 '18 at 15:39
Please refer to the post [How to make good reproducible Apache Spark Dataframe Examples](https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-dataframe-examples), and [edit] your question to include a small sample of your data and the desired output. — pault, Sep 25 '18 at 16:16
Hi pault, mayank agrawal, apologies for not being clear. I was able to solve myself using lambda expression then rdd.toDF() and then using create_map() function. — msashish, Oct 07 '18 at 14:32

score 0 · Answer 1 · answered Oct 07 '18 at 14:50

I was able to solve using below.

Read it into an rdd and split the pairs:

[Showing in steps though we can combine]

##File Input format : 'k1:v1,k2:v2,k3:v3'
rdd1 = sc.textFile(file_path)
rdd2 = rdd1.(lambda x : x.encode("ascii","ignore").split(","))
rdd3 = rdd2.(lambda x : (x[0].split(":"),x[1].split(":"),x[2].split(":")))
df = rdd3.toDF()
df.withColumn("map_column",create_map(col('_1')[0],col('_1')[1],col('_2')[0],col('_2')[1],col('_3')[0],col('_3')[1]))

If there is any better alternative or making it dynamic for any number of pairs, please suggest.

pyspark convert string array to Map()

1 Answers1