1

I want to select few columns from a DF.

Between the columns I need to add different spaces as end user wants fixed width file (but not sure about the few columns in future). So some fixed width space needs to be added between. I need to save this file as text file without header as FixedWidth file.

My output string should look like below

aaa bbb ccc ddd

where aaa, bbb... are columns values selected from DF and with 3 spaces added in between.

Can anyone please help here

Tigerjz32
  • 4,324
  • 4
  • 26
  • 34
katty
  • 167
  • 1
  • 2
  • 11

2 Answers2

1

This is pyspark In pyspark, how do you add/concat a string to a column?

But in Scala it is almost the same:

df.select(concat(col("firstColumn"), lit(" "), col("secondColumn"), lit(" "), col("thirdColumn"))).show()

Vzzarr
  • 4,600
  • 2
  • 43
  • 80
0

I think it is better to work with RDDs if you save output as a text file. Here is my solution for pyspark

>>> data = sc.parallelize([
...     ('aaa','bbb','ccc','ddd'),
...     ('aaa','bbb','ccc','ddd'),
...     ('aaa','bbb','ccc','ddd')])
>>> columns = ['a','b','c','d']
>>> 
>>> df = spark.createDataFrame(data, columns)
>>> 
>>> df.show()
+---+---+---+---+
|  a|  b|  c|  d|
+---+---+---+---+
|aaa|bbb|ccc|ddd|
|aaa|bbb|ccc|ddd|
|aaa|bbb|ccc|ddd|
+---+---+---+---+

>>> 
>>> df.registerTempTable("table1")
>>> 
>>> table1 = spark.sql("select concat(a,'   ', b,'   ',c, '   ', d) col from table1")
>>> 
>>> table1.show()
+--------------------+
|                 col|
+--------------------+
|aaa   bbb   ccc  ...|
|aaa   bbb   ccc  ...|
|aaa   bbb   ccc  ...|
+--------------------+

>>> 
>>> rdd = table1.rdd.map(lambda x: "".join([str(i) for i in x]))
>>> 
>>> rdd.collect()
['aaa   bbb   ccc   ddd', 'aaa   bbb   ccc   ddd', 'aaa   bbb   ccc   ddd']
>>> 
>>> rdd.saveAsTextFile("/yourpath")
Ali Yesilli
  • 2,071
  • 13
  • 16