Context: I'm trying to add a new column in a dataframe that was initially created with spark reading a parquet file and then converted to dataframe using pandas API on spark as following:
import pyspark.pandas as ps
df = spark.read.parquet(file)
psdf = ps.DataFrame(df)
psdf['new column'] = list_with_values
But I keep getting a keyword error saying the "new column" doesn't exist. Indeed, it doesn't exist, but I'm trying to create a new column like in pandas (in pandas you can simply do df['new colum'] = list of values and this will add a new column). I don't want to access "new column" because it isn't created just yet, I simply want to add a new column that has this list of values
How can I do this?
Note: these dataframes are on different scripts (one in a notebook, another in a python script on databricks)
Thanks