5

Spark 3.0

I ran a code df.select("Name").collect(), and I received this output below. I want to put the result below in a list. I tried adding [0] to the end, but that didn't work.

Row(Name='Andy')
Row(Name='Brandon')
Row(Name='Carl')

expected outcome = ['Andy','Brandon','Carl']
user147271
  • 145
  • 2
  • 6

2 Answers2

6

You can use rdd.

df.select('Name').rdd.map(lambda x: x[0]).collect()

['Andy', 'Brandon', 'Carl']
Lamanus
  • 12,898
  • 4
  • 21
  • 47
5

Use collect_list then get only the list by accessing index and assigned to variable.

Example:

df.show()
#+-------+
#|   Name|
#+-------+
#|   Andy|
#|Brandon|
#|   Carl|
#+-------+

output=df.agg(collect_list(col("name"))).collect()[0][0]

output
#['Andy', 'Brandon', 'Carl']

Another way would be using list comprehension:

ss=df.select("Name").collect()

output=[i[0] for i in ss]

output
#['Andy', 'Brandon', 'Carl']
notNull
  • 30,258
  • 4
  • 35
  • 50