1

I am now trying to extract sizes from the string, which is a very common pattern i guess: AxBxC where A, B, C separated with x (may be x with spaces also), are the sizes (int or float):

import re

s = 'zzz 3062 0.2 aaa 15.8x20.2x12.2875 mm'

I am expecting to obtain onli three numbers: [15.8, 20.2, 12.2875] The only working approach i have now is ugly:

r1 = re.findall('(\d+\.?\d*)\ *x\ *', s)
r2 = re.findall('\ *x\ *(\d+\.?\d*)', s)
r1.extend(r2)
print(set(r1))

{'15.8', '20.2', '12.2875'}

Is there any way to use single robust regexp for extraction these numbers? Thanks.

Alexey Trofimov
  • 4,287
  • 1
  • 18
  • 27

4 Answers4

1

This is a solution if you don't know how many numbers you will have to get:

((?:\d+\.\d+)(?=x)|(?<=x)(?:\d+\.\d+))

It's based on the fact that or the x is before your number or after it.

Marco Luzzara
  • 5,540
  • 3
  • 16
  • 42
1

It seems you need to match 2 or 3 x separated float values. You may use

r'(\d[\d.]*)x(\d[\d.]*)(?:x(\d[\d.]*))?'

See the regex demo

Details

  • (\d[\d.]*) - Group 1: a digit and then 0+ digits or/and .
  • x - a literal x
  • (\d[\d.]*) - Group 2: a digit and then 0+ digits or/and .
  • (?:x(\d[\d.]*))? - an optional sequence of x(\d[\d.]*), an x followed with Group 3 capturing a digit and then 0+ digits or/and ..

In Python, use

re.findall(r'(\d[\d.]*)x(\d[\d.]*)(?:x(\d[\d.]*))?', s)
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

Instead, for r1, you can use the following:

r1 = re.split( "x", re.findall('\d*\.\d*x\d*\.\d*x\d*\.\d*', s )[0] )

Unfortunately, it still exists of two commands, and by nesting this, it becomes a bit hard to read. And if you want to use them as numbers, they should still be converted from string to a number, e.g. for each number float(r1[#]) or use the numpy version to convert the whole array, which can be found here.

0

I hope this will help you

>>> s.split()[-2].split("x")
['15.8', '20.2', '12.2875']
Hugo Xia
  • 26
  • 2