How would I create new rows in a dataframe based on conditions of the columns?

Question

Lets say I have a dataframe like so:

ID	Color	Type
AAA	Blue	1
BBB	Red	1
BBB	Red	2
CCC	Green	1
DDD	Yellow	2

I have a list of all possible Types. In this case, the list is just ["1", "2"]. I want to create new rows (or a new df) so that each ID has a row for every type. The color value would stay the same for each ID. So the result I would end up with would be:

ID	Color	Type
AAA	Blue	1
AAA	Blue	2
BBB	Red	1
BBB	Red	2
CCC	Green	1
CCC	Green	2
DDD	Yellow	1
DDD	Yellow	2

I put the rows in order for simplicity and readability, but they dont actually need to be in order. Is something like this possible?

The operation you are trying to perform is known as the "Cartesian product", and you can find an answer on how you would accomplish this [here](https://stackoverflow.com/a/13270110/11659881). — Kraigolas, Mar 30 '22 at 02:38

score 0 · Answer 1 · answered Mar 30 '22 at 03:41

0

You can create a column with array of possible values and then explode it. eg:

types_array = [1,2]

df = df.withColumn("types", F.array([F.lit(x) for x in types_array]))
df = df.withColumn("new_type", F.explode("types"))

answered Mar 30 '22 at 03:41

greenie

409
3
6

How would I create new rows in a dataframe based on conditions of the columns?

1 Answers1