0

I am beginner in Spark and I am looking for a solution for my issue. I'm trying to sort a dataframe according to the number of null values each column contains in ascending order.

For example: data:

column1    Column2     Column3
a          d           h
b          null        null
null       e           i
null       f           h
null       null        k
c          g           l

After sorting, the dataframe should be:

Column3     Colum2     Column1

All I could do is to count each column's null values.

data.select([count(when(col(c).isNull(), c)).alias(c) for c in data.columns])

Now, I have no idea how to continue. I wish you could help me.

Mus
  • 183
  • 1
  • 1
  • 14
  • Does this answer your question? [Python/pyspark data frame rearrange columns](https://stackoverflow.com/questions/42912156/python-pyspark-data-frame-rearrange-columns) – sergiomahi Jan 29 '20 at 18:58
  • Relevant: https://stackoverflow.com/q/44627386/11301900 – AMC Jan 29 '20 at 19:47

1 Answers1

0

My solution, it work as you want:

#Based on your code
df=df.select([count(when(col(c).isNull(), c)).alias(c) for c in df.columns])

# Convert dataframe to dictionary (Python 3.x)
dict = list(map(lambda row: row.asDict(), df.collect()))[0]

# Create a dictionary with sorted values based on keys
sorted_dict={k: v for k, v in sorted(dict.items(), key=lambda item: item[1])}

# Create a sorted list with the column names
sorted_cols = [c for c in sorted_dict.keys()]

# With .select() method we re-order the dataframe
df.select(sorted_cols).show()
ggeop
  • 1,230
  • 12
  • 24
  • Thanks very much for replying. However, it showed an error in the dictionary line 'Unsupported class file major version 55'. I'll try to fix it. Thank you so much – Mus Jan 29 '20 at 19:42
  • @Mus are you using Python 2.x ? Because my implementation is for Python 3.x – ggeop Jan 29 '20 at 19:57
  • For python 2.x take a look in this post: https://stackoverflow.com/questions/9001509/how-can-i-sort-a-dictionary-by-key – ggeop Jan 29 '20 at 19:58
  • If my answer is ok for you you can accepted if you want :-) – ggeop Jan 29 '20 at 19:59
  • Yes I'm using Python2.7. I tried python3 and your answer works 100%. Thanks again – Mus Jan 29 '20 at 21:03
  • @Mus Perfect! :-) – ggeop Jan 29 '20 at 21:06