PySpark: How do I fix 'function' object has no attribute 'rand' error?

Question

I am trying to randomly select 100 rows from my PySpark Dataframe. For that I would like to use the code as described in this post:

training_data= data.orderBy(F.rand()).limit(100)

However I get the error:

AttributeError: 'function' object has no attribute 'rand'

I imported rand() the following way:

from pyspark.sql.functions import rand as F

I tried to import rand the same way as decribed in the post, but I get the error:

ModuleNotFoundError: No module named 'org'

I also tried to use the function just as such:

training_data= data.orderBy(rand()).limit(100)

But then I get the following name error:

NameError: name 'rand' is not defined

Does anyone know how to fix it ? I am new to PySpark and I think I am missing something obvious here. Note that I am working on Databricks.

Thank you

First do `from pyspark.sql import functions as F`. Then you can use `F.rand()` — pault, Apr 09 '20 at 13:22

score 0 · Accepted Answer · answered Apr 09 '20 at 11:01

0

Ok, I actually managed to achieve what I wanted by doing the following:

training_data, test_data = data.randomSplit([0.7, 0.3], seed = 100)

answered Apr 09 '20 at 11:01

DataBach

1 Answers1