I want to generate a column with random numbers like this:
df=df.withColumn("random_col",random.randint(100000, 1000000))
The above gives me an error:
AssertionError: col should be Column
I want to generate a column with random numbers like this:
df=df.withColumn("random_col",random.randint(100000, 1000000))
The above gives me an error:
AssertionError: col should be Column
First I would make sure you have imported the correct stuff...
Try importing: from pyspark.sql.functions import rand
And then trying something like this line of code:
df1 = df.withColumn("random_col", rand() > 100000, 1000000)
You also could check out this resource. It looks like it may be helpful for what you are doing
Hope this helps!
Run into this issue and couldn't find anything concrete, eventually figured it out, hopefully this helps anyone stuck:
# To add a column with values from a range of random values first create the column in a new Spark dataframe.
# import libraries
import random
from pyspark.sql import functions as F
from pyspark.sql.types import IntegerType, StringType, StructField, StructType
# Define new df schema
schema = StructType(
[
StructField("id", StringType(), nullabe=False),
StructField("random_value", IntegerType(), nullabe=False)
]
# create empty list
data = list()
for i in range(0, 200): # adjust values as you wish
data.append(
{
"random_value": random.randint(500, 10000) # adjust values as you wish
}
)
# Create the Spark dataframe
df = spark.createDataFrame(data, schema)
# Add id ordering
df1 = df.withColumn("id", F.monotonically_increasing_id())