0

Context: I'm trying to add a new column in a dataframe that was initially created with spark reading a parquet file and then converted to dataframe using pandas API on spark as following:

import pyspark.pandas as ps

df = spark.read.parquet(file)
psdf = ps.DataFrame(df)

psdf['new column'] = list_with_values

But I keep getting a keyword error saying the "new column" doesn't exist. Indeed, it doesn't exist, but I'm trying to create a new column like in pandas (in pandas you can simply do df['new colum'] = list of values and this will add a new column). I don't want to access "new column" because it isn't created just yet, I simply want to add a new column that has this list of values

How can I do this?

Note: these dataframes are on different scripts (one in a notebook, another in a python script on databricks)

Thanks

user139442
  • 91
  • 1
  • 11
  • Does this question help? https://stackoverflow.com/questions/48164206/pyspark-adding-a-column-from-a-list-of-values – Nick ODell Nov 24 '22 at 18:19
  • Hey, not quite, I was looking for an approach similar to pandas. I feel like this should be a simple proceedure that stark just overcomplicates, but thank you for your input! – user139442 Nov 24 '22 at 18:21

0 Answers0