-1

AttributeError: 'DataFrame' object has no attribute '_get_object_id'

enter image description here

user3837868
  • 917
  • 1
  • 12
  • 24

1 Answers1

0

First of all: It is really important that you give us a reproducible example of your dataframe. Nobody likes to look at screenshots to identify an error.

Your code is not working because spark can't determine how the rows of your groupby and your initial dataframe can be merge. It isn't aware of that NUM_TIERS is somekind of a key. Therefore you have to tell spark which column(s) should be used to merge the groupby and the initial dataframe.

import pyspark.sql.functions as F
from pyspark.sql import Window

l = [('OBAAAA7K2KBBO'       , 34),
('OBAAAA878000K'      , 138  ),
('OBAAAA878A2A0'      , 164  ),
('OBAAAA7K2KBBO'      , 496),
('OBAAAA878000K'      , 91)]

columns = ['NUM_TIERS', 'MONTAN_TR']

df=spark.createDataFrame(l, columns)

You have to options to do that. You can use a join:

df = df.join(df.groupby('NUM_TIERS').sum('MONTAN_TR'), 'NUM_TIERS')
df.show()

OR a window function:

w = Window.partitionBy('NUM_TIERS')

df = df.withColumn('SUM', F.sum('MONTAN_TR').over(w))

Output is the same for both ways:

+-------------+---------+---+ 
|    NUM_TIERS|MONTAN_TR|SUM| 
+-------------+---------+---+ 
|OBAAAA7K2KBBO|       34|530| 
|OBAAAA7K2KBBO|      496|530| 
|OBAAAA878000K|      138|229| 
|OBAAAA878000K|       91|229| 
|OBAAAA878A2A0|      164|164| 
+-------------+---------+---+
cronoik
  • 15,434
  • 3
  • 40
  • 78