0

I have a string as below in a text file:

ar.txt has 'K1:v1,K2:v2, K3:v3'

I have read this into an RDD and trying to convert it into MapType(StringType(), StringType()). When I try below it gives error with nulltype.

# Say data is in rdd called ar_rdd

ar_rdd1 = ar_rdd.map(lambda x: create_map(x.encode("ascii","ignore").split(",")) ))

Please suggest how to convert into a MapType() column ?

msashish
  • 277
  • 2
  • 6
  • 18
  • 1
    Provide more info including code you used for reading, preview of data and full error. – mayank agrawal Sep 25 '18 at 15:39
  • Please refer to the post [How to make good reproducible Apache Spark Dataframe Examples](https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-dataframe-examples), and [edit] your question to include a small sample of your data and the desired output. – pault Sep 25 '18 at 16:16
  • Hi pault, mayank agrawal, apologies for not being clear. I was able to solve myself using lambda expression then rdd.toDF() and then using create_map() function. – msashish Oct 07 '18 at 14:32

1 Answers1

0

I was able to solve using below.

Read it into an rdd and split the pairs:

[Showing in steps though we can combine]

##File Input format : 'k1:v1,k2:v2,k3:v3'
rdd1 = sc.textFile(file_path)
rdd2 = rdd1.(lambda x : x.encode("ascii","ignore").split(","))
rdd3 = rdd2.(lambda x : (x[0].split(":"),x[1].split(":"),x[2].split(":")))
df = rdd3.toDF()
df.withColumn("map_column",create_map(col('_1')[0],col('_1')[1],col('_2')[0],col('_2')[1],col('_3')[0],col('_3')[1]))

If there is any better alternative or making it dynamic for any number of pairs, please suggest.

msashish
  • 277
  • 2
  • 6
  • 18