How to pass list of values, json pyspark

Question

 >>> from pyspark.sql import SQLContext
 >>> sqlContext = SQLContext(sc)
 >>> rdd =sqlContext.jsonFile("tmp.json") 
 >>> rdd_new= rdd.map(lambda x:x.name,x.age)

Its working properly.But there is list of values list1=["name","age","gene","xyz",.....] When I am passing

 For each_value in list1:
     `rdd_new=rdd.map(lambda x:x.each_value)` I am getting error

1. we have list1=["name","age","gene","xyz",.....] and i want pass dynamic for the list1 1.e rdd_new= rdd.map(lambda x:x.name,x.ag,x.gene,......... ) — Kumar, May 27 '15 at 08:56

score 2 · Accepted Answer · edited May 23 '17 at 12:08

I think what you need is to pass on the name of fields you want to select. In that case, see the following:

r1 = ssc.jsonFile("test.json")
    r1.printSchema()
    r1.show()

    l1 = ['number','string']
    s1 = r1.select(*l1)
    s1.printSchema()
    s1.show()

root
 |-- array: array (nullable = true)
 |    |-- element: long (containsNull = true)
 |-- boolean: boolean (nullable = true)
 |-- null: string (nullable = true)
 |-- number: long (nullable = true)
 |-- object: struct (nullable = true)
 |    |-- a: string (nullable = true)
 |    |-- c: string (nullable = true)
 |    |-- e: string (nullable = true)
 |-- string: string (nullable = true)

array                boolean null number object  string     
ArrayBuffer(1, 2, 3) true    null 123    [b,d,f] Hello World
root
 |-- number: long (nullable = true)
 |-- string: string (nullable = true)

number string     
123    Hello World

This is done through a Dataframe. Note the way arg list is passed. For more, you can see this link

we want to use collect then l1 = ['number','string'] s1 = r1.select(*l1) s1.collect() #I want in list of tuple[ (u'123,u'hello world"),[(u'456,u'hello ayan")...............] — Kumar, Jun 08 '15 at 06:17

How to pass list of values, json pyspark

1 Answers1