I'm new to Pandas but thanks to Add column with constant value to pandas dataframe I was able to add different columns at once with
c = {'new1': 'w', 'new2': 'y', 'new3': 'z'}
df.assign(**c)
However I'm trying to figure out what's the path to take when I want to add a new column to a dataframe (currently 1.2 million rows * 23 columns).
Let's simplify the df a bit and try to make it more clear:
Order Orderline Product
1 0 Laptop
1 1 Bag
1 2 Mouse
2 0 Keyboard
3 0 Laptop
3 1 Mouse
I would like to add a new column where depending if the Order has at least 1 product == Bag then it should be 1 (for all rows for that specific order), otherwise 0.
Result would become:
Order Orderline Product HasBag
1 0 Laptop 1
1 1 Bag 1
1 2 Mouse 1
2 0 Keyboard 0
3 0 Laptop 0
3 1 Mouse 0
What I could do is find all the unique order numbers, then filter out the subframe, check the Product column for Bag, if found then add 1 to a new column, otherwise 0, and then replace the original subframe with the result.
Likely there's a way better manner to accomplish this, and also way more performant.
The main reason I'm trying to do this, is to flatten things down later on. Every order should become 1 line with some values of product. I don't need the information for Bag anymore but I want to keep in my dataframe if the original order used to have a Bag (1) or no Bag (0).
Ultimately when the data is cleaned out it can be used as a base for scikit-learn (or that's what I hope).