I want to create new dataset based on original dataset for example
for example my input1
my output should be2 I refer other code and got thiss
def duplicate_function(row):
data = [] # list of rows to return
to_duplicate = float(row["No_of_Occ"])
i = 0
while i < to_duplicate:
row_dict = row.asDict() # convert a Spark Row object to a Python dictionary
row_dict["No_of_Occ"] = str(i)
new_row = Row(**row_dict) # create a Spark Row object based on a Python dictionary
to_return.append(new_row) # adds this Row to the list
i += 1
return data # returns the final list
but how can I get the No_of_occ here?