I have the following column in a pyspark dataframe, of type Array[Int].
+--------------------+
| feature_indices|
+--------------------+
| [0]|
|[0, 1, 4, 10, 11,...|
| [0, 1, 2]|
| [1]|
| [0]|
+--------------------+
I am trying to pad the array with zeros, and then limit the list length, so that the length of each row's array would be the same. For example, for n = 5, I expect:
+--------------------+
| feature_indices|
+--------------------+
| [0, 0, 0, 0, 0]|
| [0, 1, 4, 10, 11]|
| [0, 1, 2, 0, 0]|
| [1, 0, 0, 0, 0]|
| [0, 0, 0, 0, 0]|
+--------------------+
Any suggestions? I looked at pyspark rpad
function, but it only operates on string type columns.