I have a Pandas Dataframe that contains multiple comma separated values across 3 columns.
Dataframe:
df = pd.DataFrame({'City': ['Boston', 'Atlanta', 'Chicago', 'Chicago', 'Phoenix'],
'State': ['MA', 'GA', 'IL', 'IL', 'AZ'],
'Country': ['US', 'US', 'US', 'US', 'US'],
'Value1': ['a', 'a,b,c', 'a,b,c,d', 'a', 'a,b'],
'Value2': ['b', 'd,e,f', 'e,f,g', 'b,c', 'c,d,e'],
'Value3': ['c', 'g,h,i', 'h,i,j', 'd', 'f,g,h,i']
})
What I want:
I'd like to split it onto duplicate rows so that City
, State
, and Country
are essentially duplicated but Value1
, Value2
, and Value3
are split by comma onto new rows.
As the image above shows if the number of values don't match I'd like to just put a blank or an N/A in the field instead. This is purely based on the position of the element so Value1 position 1 matches with Value 2 and 3 positon 1.
The issue I'm having is that there's no guarantee that Value1
, Value2
, and Value3
will contain the same amount of comma separated values, so trying to use df.explode() gives errors.
A simpler solution might be to just try to add commas to the end of the cells before exploding but I'm unsure how to do that. For example make [a,b,c] [d,e] [f] go to [a,b,c] [d,e,] [f,,]? I'm at my wits end trying to do this. Any help would be super appreciated.