extract specific numbers and calcula

Question

I have an input -Dimension and I want my output in a specific form extracted from the text that I have in the dimension column.

I have used

df['output'] = df['Dimension'].str.extractall(r'(\d*\.?\d+)').astype(float).unstack().prod(axis=1)

but i'm not able to print the desired output for Model E. Please help me here in python.

Model	Dimension	output
A	4.31 m x 2 m x 3.222 m	27.77364
B	220m	220
C	'St 473m	473
D	rangeZng 2x250m	500
E	Original 250ml 2s 35%	500
F	Qstd 550ml 1+1	550
G	very good cream 250ml 2s 35%	500
H	very good cream 250ml 2s 45%	500

I have an error as ValueError: Index contains duplicate entries, cannot reshape, how to proceed? — Chayan Banerjee, Jun 23 '23 at 11:00
the code fails to print 300 for this text Niv.Shwr F/Blend Apct300m 2sPO any idea how can this be rectified? @mozway — Chayan Banerjee, Aug 07 '23 at 05:38

score 0 · Answer 1 · answered Jun 07 '23 at 06:30

0

You could maybe change your regex to:

df['output'] = (df['Dimension'].str.extractall(r'(?<![+])(\d*\.?\d+)(?![%+])(?=\D|$)')
                .astype(float).unstack().prod(axis=1)
               )

Output:

  Model                     Dimension     output
0     A        4.31 m x 2 m x 3.222 m   27.77364
1     B                          220m  220.00000
2     C                      'St 473m  473.00000
3     D               rangeZng 2x250m  500.00000
4     E         Original 250ml 2s 35%  500.00000
5     F                Qstd 550ml 1+1  550.00000
6     G  very good cream 250ml 2s 35%  500.00000

regex demo

answered Jun 07 '23 at 06:30

mozway

194,879
13
39
75

I ran this but it says ValueError: Index contains duplicate entries, cannot reshape. – Chayan Banerjee Jun 23 '23 at 10:59
@CBCB please provide a [minimal reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) that reproduces the error, the code works well with the current input. Very likely you first need to reset your index: `df = df.reset_index(drop=True)` – mozway Jun 23 '23 at 11:18
You can also replace `.unstack().prod(axis=1)` by `.groupby(level=0).prod()` (this will aggregate by index, so be careful if you have duplicates!) – mozway Jun 23 '23 at 11:24
Yes I have many duplicates in my dataset and these are on different columns, how to get the output if I have duplicates? @mozway – Chayan Banerjee Jul 03 '23 at 08:42
What matters is that your index is not duplicated (run `df = df.reset_index(drop=True)` before my code), but please focus your question on a specific issue. It looks to me that the core problem here (extracting the product of numbers) is solved, no? Also, why don't you provide a reproducible example (output of `df.to_dict('tight')` as [edit](https://stackoverflow.com/posts/76420393/edit) to your question) to avoid wasting time? – mozway Jul 03 '23 at 08:48

extract specific numbers and calcula

1 Answers1