1

I am trying this in databricks . Please let me know the pyspark libraries needed to be imported and code to get the below output in Azure databricks pyspark

example:- input dataframe :-

|     column1     |    column2    | column3  |  column4  |

| a               | bbbbb         | cc       | >dddddddd |
| >aaaaaaaaaaaaaa | bb            | c        | dddd      |
| aa              | >bbbbbbbbbbbb | >ccccccc | ddddd     |
| aaaaa           | bbbb          | ccc      | d         |

output dataframe :-

| column  | maxLength |

| column1 |        14 |
| column2 |        12 |
| column3 |         7 |
| column4 |         8 |
Ronak Jain
  • 3,073
  • 1
  • 11
  • 17
VIGNESH R
  • 23
  • 1
  • 5

1 Answers1

11
>>> from pyspark.sql import functions as sf
>>> df = sc.parallelize([['a','bbbbb','ccc','ddd'],['aaaa','bbb','ccccccc', 'dddd']]).toDF(["column1", "column2", "column3", "column4"])
>>> df1 = df.select([sf.length(col).alias(col) for col in df.columns])
>>> df1.groupby().max().show()
+------------+------------+------------+------------+
|max(column1)|max(column2)|max(column3)|max(column4)|
+------------+------------+------------+------------+
|           4|           5|           7|           4|
+------------+------------+------------+------------+

then use this link to melt previous dataframe

Edit: (From Iterate through each column and find the max length)

Single line select

from pyspark.sql.functions import col, length, max

df=df.select([max(length(col(name))).alias(name) for name in df.schema.names])

Output Output

As Rows

df = df.select([max(length(col(name))).alias(name) for name in df.schema.names])
row=df.first().asDict()
df2 = spark.createDataFrame([Row(col=name, length=row[name]) for name in df.schema.names], ['col', 'length'])

Output:

Output

Ronak Jain
  • 3,073
  • 1
  • 11
  • 17
E.ZY.
  • 675
  • 5
  • 12
  • 1
    Thanks this worked and also taking less time to execute :) :) – VIGNESH R Nov 04 '20 at 12:34
  • 1
    @VIGNESHR Glad to know that your issue has resolved. You can accept it as an answer( click on the check mark beside the answer to toggle it from greyed out to filled in). This can be beneficial to other community members. Thank you. – CHEEKATLAPRADEEP Nov 09 '20 at 06:43
  • 1
    @E.ZY. EXCELENT solution. Thank you for sharing your knowledge. – nam Aug 07 '22 at 15:59