I have dataframe as below:
------+--------------+
| sid|first_term_gpa|
+------+--------------+
|100170| 2.0|
|100446| 3.8333|
|100884| 2.0|
|101055| 3.0|
|101094| 3.7333|
|101775| 3.7647|
|102524| 3.8235|
|102798| 3.5|
|102960| 2.8235|
|103357| 3.0|
|103747| 3.8571|
|103902| 3.8|
|104053| 3.1667|
|104064| 1.8462|
and I have created a UDF function
def student_gpa(gpa):
bins = ['[0,1)', '[1,2)', '[2,3)', '[3,4)']
return bins[float(gpa)]
with parameter gpa expected to be float
I apply the UDF created above to the first_term_gpa column to create a new column named gpa_bin with code below:
alumni_ft_gpa = first_term_gpa \
.withColumn('gpa_bin', expr('student_gpa(first_term_gpa)'))\
.show()
but it throws me error:
An exception was thrown from a UDF: 'TypeError: list indices must be integers or slices, not float',
What I am missing here?