I am trying to clean my csv file. As doing so I had to copy rows for the products that had 'Quantity'>1. I did that like this:
file_in = file_in.loc[file_in.index.repeat(file_in.´Quantity)].reset_index(drop=True)
But now I have another problem. In another column for some of my products I have SerialNumbers , which were ,at the begining all in one row added, and as I did the repeat they all got all of the Serial Numbers. Now I want to split those Serial Numbers so that each row has only one .
So from this:
Quantity SerialNumbers ProductName
1 3 AB1 AB2 AB3 PR
2 3 AB1 AB2 AB3 PR
3 3 AB1 AB2 AB3 PR
4 1 CD monitor
5 4 NAN mouse
6 4 NAN mouse
7 4 NAN mouse
8 4 NAN mouse
9 4 EF1 EF2 EF3 EF4 ace
10 4 EF1 EF2 EF3 EF4 ace
11 4 EF1 EF2 EF3 EF4 ace
12 4 EF1 EF2 EF3 EF4 ace
13 2 NAN screeen
14 2 NAN screeen
15 1 NAN flash
...
I need it to be like this:
Quantity SerialNumbers ProductName
1 3 AB1 PR
2 3 AB2 PR
3 3 AB3 PR
4 1 CD monitor
5 4 NAN mouse
6 4 NAN mouse
7 4 NAN mouse
8 4 NAN mouse
9 4 EF1 ace
10 4 EF2 ace
11 4 EF3 ace
12 4 EF4 ace
13 2 NAN screeen
14 2 NAN screeen
15 1 NAN flash
..
I tried adding a new column where I would first create the list from my Serial Numbers and then by use of groupby function edit. I tried that using this , but it doesn't work.
file_in['Temp_Split'] = numpy.where((~file_in['Temp_Split'].isna()) & (file_in['Quantity']>1),file_in['Temp_Serial'].str.split() & file_in.groupby(['Position','Quantity'])['Temp_Split'].rank(method='first'), file_in['Temp_split'])
I searched for similar questions, but they are all for copying the rows while spliting the values. Maybe I should change the first line of code so that as it does the repeat it checks for the SerialNumbers and edit that also. But nor really sure how should I do that.
EDIT: I am not trying to copy rows while spliting the value in one column, I am trying to remove what is not needed for one row and add to the next one. As shown in example. There are still same amount of rows as at the begining