split a string list in one cell to row with python

Question

I have this table:

product	date	value
Prdct_1Prdct_2Prdct_3Prdct_4Prdct_5Prdct_6	23.7.2022	100

and the result should be

product	date	value
Prdct_1	23.7.2022	100
Prdct_2	23.7.2022	100
Prdct_3	23.7.2022	100
Prdct_4	23.7.2022	100
Prdct_5	23.7.2022	100
Prdct_6	23.7.2022	100

How should I solve it with python?

explode doesn't work in this case, since as I read the excel table in jupyternotebook, the product column looks like this Prdct_1\nPrdct_2\nPrdct_3\nPrdct_4\nPrdct_5\nPrdct_6. — anewone, Jun 28 '23 at 08:49
For the same of the question, can you confirm whether your `Prdct_1Prdct_2Prdct_3Prdct_4Prdct_5Prdct_6` are literals? How does it generalize with real data? — mozway, Jun 28 '23 at 08:49
If you have `\n`, then split on `\n`. Please provide the output of `df.head().to_dict('list')` for reproducibility. Also please try `df.assign(product=df['product'].str.split('\n')).explode('product')` — mozway, Jun 28 '23 at 08:50

score 2 · Accepted Answer · answered Jun 28 '23 at 08:46

You need to split and explode, the subtlety is how to determine the splitting points?

Assuming you want to split after the digits, use the (?<=\d)(?=\D) regex (split after a digit and before a non-digit):

out = (df.assign(product=df['product'].str.split(r'(?<=\d)(?=\D)'))
         .explode('product')
      )

Output:

   product       date  value
0  Prdct_1  23.7.2022    100
0  Prdct_2  23.7.2022    100
0  Prdct_3  23.7.2022    100
0  Prdct_4  23.7.2022    100
0  Prdct_5  23.7.2022    100
0  Prdct_6  23.7.2022    100

regex demo

score 0 · Answer 2 · answered Jun 28 '23 at 08:46

Maybe you can try splitting and exploding. For example,

data = {
    'product': ['Prdct_1Prdct_2Prdct_3Prdct_4Prdct_5Prdct_6'],
    'date': ['23.7.2022'],
    'value': [100]
}

df = pd.DataFrame(data)

df['product'] = df['product'].str.split('Prdct_')
df = df.explode('product')
df.reset_index(drop=True, inplace=True)

product date    value
0       23.7.2022   100
1   1   23.7.2022   100
2   2   23.7.2022   100
3   3   23.7.2022   100
4   4   23.7.2022   100
5   5   23.7.2022   100
6   6   23.7.2022   100

score 0 · Answer 3 · answered Jun 28 '23 at 08:55

Did you try this?

headers = ["product", "date", "value"]
row_values = ["Prdct_1Prdct_2Prdct_3Prdct_4Prdct_5Prdct_6" , "23.7.2022", "100"]

delimiter
split_values = row_values[0].split("_")

table_dict = dict(zip(headers, [split_values] + row_values[1:]))

print(table_dict)

split a string list in one cell to row with python

3 Answers3