0

I'm trying to extract/match data from a string using regular expression but I don't seem to get it.

I wan't to extract from the following string the i386 (The text between the last - and .iso):

/xubuntu/daily/current/lucid-alternate-i386.iso

This should also work in case of:

/xubuntu/daily/current/lucid-alternate-amd64.iso

And the result should be either i386 or amd64 given the case.

Thanks a lot for your help.

badp
  • 11,409
  • 3
  • 61
  • 89
user175259
  • 4,641
  • 5
  • 20
  • 15

7 Answers7

3

You could also use split in this case (instead of regex):

>>> str = "/xubuntu/daily/current/lucid-alternate-i386.iso"
>>> str.split(".iso")[0].split("-")[-1]
'i386'

split gives you a list of elements on which your string got 'split'. Then using Python's slicing syntax you can get to the appropriate parts.

Community
  • 1
  • 1
ChristopheD
  • 112,638
  • 29
  • 165
  • 179
1
r"/([^-]*)\.iso/"

The bit you want will be in the first capture group.

Amber
  • 507,862
  • 82
  • 626
  • 550
  • Were you trying to use `match()` or `search()`? Since this is a partial-match pattern, it should be used with `search()` not `match()` (since `match()` attempts to match the entire string, not just a portion). – Amber May 27 '10 at 22:30
1

First off, let's make our life simpler and only get the file name.

>>> os.path.split("/xubuntu/daily/current/lucid-alternate-i386.iso")
('/xubuntu/daily/current', 'lucid-alternate-i386.iso')

Now it's just a matter of catching all the letters between the last dash and the '.iso'.

badp
  • 11,409
  • 3
  • 61
  • 89
1

If you will be matching several of these lines using re.compile() and saving the resulting regular expression object for reuse is more efficient.

s1 = "/xubuntu/daily/current/lucid-alternate-i386.iso"
s2 = "/xubuntu/daily/current/lucid-alternate-amd64.iso"

pattern = re.compile(r'^.+-(.+)\..+$')

m = pattern.match(s1)
m.group(1)
'i386'

m = pattern.match(s2)
m.group(1)
'amd64'
Peter McG
  • 18,857
  • 8
  • 45
  • 53
0

The expression should be without the leading trailing slashes.

import re

line = '/xubuntu/daily/current/lucid-alternate-i386.iso'
rex = re.compile(r"([^-]*)\.iso")
m = rex.search(line)
print m.group(1)

Yields 'i386'

koblas
  • 25,410
  • 6
  • 39
  • 49
0
reobj = re.compile(r"(\w+)\.iso$")
match = reobj.search(subject)
if match:
    result = match.group(1)
else:
    result = ""

Subject contains the filename and path.

Turtle
  • 1,320
  • 10
  • 11
0
>>> import os
>>> path = "/xubuntu/daily/current/lucid-alternate-i386.iso"
>>> file, ext = os.path.splitext(os.path.split(path)[1])
>>> processor = file[file.rfind("-") + 1:]
>>> processor
'i386'
manifest
  • 2,208
  • 16
  • 13