I'd like to divide a certain number of items over multiple rows. Every row should get at least 1, but the rest according to their required share, until all items have been distributed. Lets say we have 6 available, I'd like to get the result as follows.
Using max(1, factor * available)
doesn't necessarily make up to the total number of available items.
Is there a way? I have the data in a spark environment, so the coolest method would be a pyspark or even pandas/numpy solution. It can quite easily be done in a python loop obviously.
Input: Total available 6
+---+-------------+
| c1| factor|
+---+-------------+
| A| 0.001|
| B| 0.2|
| C| 0.2|
| D| 0.2|
| E| 0.3|
+---+-------------+
Expected output:
+---+-------------+---------+
| c1| factor| result|
+---+-------------+---------+
| A| 0.001| 1|
| B| 0.2| 1|
| C| 0.2| 1|
| D| 0.2| 1|
| E| 0.3| 2|
+---+-------------+---------+