I want to re-arrange a Pandas Dataframe to include an extra row based on the values in two (or more) columns whenever they include a delimiter. This extra row would have all variables identical to the original, except for the columns which are searched for a delimiter. If a delimiter is found, I would want an additional row with the second elements of both columns (after the delimiter). The following code works for a single column and illustrates my goals nicely:
df = pd.DataFrame([{'var1': 'a,b,c', 'var2': 1}, {'var1': 'd,e,f', 'var2': 2}])
df.assign(var1=df.var1.str.split(',')).explode('var1').reset_index(drop=True)
Although when specifying two columns, multiple rows are included for each column, as the following code yields:
df = pd.DataFrame([{'var1': 'a,b,c', 'var2': 1, 'var3': 'I, II, III'}, {'var1': 'd,e,f', 'var2': 2, 'var3': 'IV, V, VI'}])
df.assign(var1=df.var1.str.split(','), var2=df.var2.str.split(',')).explode('var1').explode('var2').reset_index(drop=True)
I would like to have a single row included with the second element of both columns, like so:
df = pd.DataFrame([{'var1': 'a', 'var2': 1, 'var3': 'I'}, {'var1': 'b', 'var2': 1, 'var3': 'II'}, {'var1': 'c', 'var2': 1, 'var3': 'III'}, {'var1': 'd', 'var2': 2, 'var3': 'IV'}, {'var1': 'e', 'var2': 2, 'var3': 'V'}, {'var1': 'f', 'var2': 2, 'var3': 'VI'}])
I know that splitting and exploding on both columns and then subsetting the resultant dataframe would allow me to obtain the result I want, but I was looking for a potentially cleaner way to do this.
Note that for each rows, both delimiter columns will always have the same amount of delimiters.
edit
The .explode() attribute is only available in pandas >= 0.25.x