1

I have this table:

product date value
Prdct_1Prdct_2Prdct_3Prdct_4Prdct_5Prdct_6 23.7.2022 100

and the result should be

product date value
Prdct_1 23.7.2022 100
Prdct_2 23.7.2022 100
Prdct_3 23.7.2022 100
Prdct_4 23.7.2022 100
Prdct_5 23.7.2022 100
Prdct_6 23.7.2022 100

How should I solve it with python?

anewone
  • 27
  • 4
  • 1
    split and `explode()`? – Beck Jun 28 '23 at 08:42
  • explode doesn't work in this case, since as I read the excel table in jupyternotebook, the product column looks like this Prdct_1\nPrdct_2\nPrdct_3\nPrdct_4\nPrdct_5\nPrdct_6. – anewone Jun 28 '23 at 08:49
  • For the same of the question, can you confirm whether your `Prdct_1Prdct_2Prdct_3Prdct_4Prdct_5Prdct_6` are literals? How does it generalize with real data? – mozway Jun 28 '23 at 08:49
  • 1
    If you have `\n`, then split on `\n`. Please provide the output of `df.head().to_dict('list')` for reproducibility. Also please try `df.assign(product=df['product'].str.split('\n')).explode('product')` – mozway Jun 28 '23 at 08:50
  • @mozway this works. Thanks. – anewone Jun 28 '23 at 08:55
  • OK, then it's a duplicate, I'll close the question ;) – mozway Jun 28 '23 at 09:04

3 Answers3

2

You need to split and explode, the subtlety is how to determine the splitting points?

Assuming you want to split after the digits, use the (?<=\d)(?=\D) regex (split after a digit and before a non-digit):

out = (df.assign(product=df['product'].str.split(r'(?<=\d)(?=\D)'))
         .explode('product')
      )

Output:

   product       date  value
0  Prdct_1  23.7.2022    100
0  Prdct_2  23.7.2022    100
0  Prdct_3  23.7.2022    100
0  Prdct_4  23.7.2022    100
0  Prdct_5  23.7.2022    100
0  Prdct_6  23.7.2022    100

regex demo

mozway
  • 194,879
  • 13
  • 39
  • 75
0

Maybe you can try splitting and exploding. For example,

data = {
    'product': ['Prdct_1Prdct_2Prdct_3Prdct_4Prdct_5Prdct_6'],
    'date': ['23.7.2022'],
    'value': [100]
}

df = pd.DataFrame(data)

df['product'] = df['product'].str.split('Prdct_')
df = df.explode('product')
df.reset_index(drop=True, inplace=True)

product date    value
0       23.7.2022   100
1   1   23.7.2022   100
2   2   23.7.2022   100
3   3   23.7.2022   100
4   4   23.7.2022   100
5   5   23.7.2022   100
6   6   23.7.2022   100
Beck
  • 115
  • 6
0

Did you try this?

headers = ["product", "date", "value"]
row_values = ["Prdct_1Prdct_2Prdct_3Prdct_4Prdct_5Prdct_6" , "23.7.2022", "100"]

delimiter
split_values = row_values[0].split("_")

table_dict = dict(zip(headers, [split_values] + row_values[1:]))

print(table_dict)

Ahmad Abdelbaset
  • 295
  • 2
  • 11