1

I am using pandas to read csv on my machine then I create a pyspark dataframe from pandas dataframe.

df = spark.createDataFrame(pandas_df) 

I updated my pandas from version 1.3.0 to 2.0

Now, I am getting this error:

enter image description here


enter image description here

AttributeError: 'DataFrame' object has no attribute 'iteritems'
Talha Tayyab
  • 8,111
  • 25
  • 27
  • 44
  • Seems like this is similar to this issue here. Maybe could help: https://stackoverflow.com/questions/75926636/databricks-issue-while-creating-spark-data-frame-from-pandas – krish Jun 05 '23 at 08:52

1 Answers1

2

Found an answer on github: https://github.com/YosefLab/Compass/issues/92

It is an issue going on.

iteritems is removed from pandas 2.0

For now I need to downgrade pandas back to version 1.5.3


Edit:

Other workarounds may be

Use the latest Spark (3.4.1)

https://spark.apache.org/downloads.html


You can also assign DataFrame.items to DataFrame.iteritems

df.iteritems = df.items

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.items.html?highlight=items#pandas.DataFrame.items

Talha Tayyab
  • 8,111
  • 25
  • 27
  • 44
  • It may require some additional workaround, but [DataFrame.items](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.items.html?highlight=items#pandas.DataFrame.items) seems to perform what you need it to – Shorn Jun 05 '23 at 08:57