0

I thought it rather simple but can't figure out by myself. I have a range of non-continuous items, like this:

farm011 - farm018, farm020, farm022 - farm033, farm041 - farm052, ......

which I want to put in a list(). What's the easiest way of doing that? Just to make it clear[er], I think the list should look like this:

myItem = ['farm011','farm012','farm013','farm014','farm020','farm022','farm023','farm024','farm25',....]

I'm sorry if it's already answered here and I didn't find it. Thanks in advance. cheers!!


Update 1: Error message from eyquem code

I copied & pasted the code exactly as you wrote and this is what I get in error:

File "./test.py", line 11
    gen = ( ("%s%03d"%(w1,i) for i in range(int(s),int(e)+1)) if w2
                               ^
SyntaxError: invalid syntax
MacUsers
  • 2,091
  • 3
  • 35
  • 56
  • 1
    How should the list look like? Is a `ncitems.split(',')`enough? – Ocaso Protal Apr 10 '11 at 08:46
  • 1
    What form do you have the items in now? Are they arbitrary objects, a string, a dict, .... ? – David Z Apr 10 '11 at 08:52
  • @Ocaso, @David: I don't think `split(',')` will do the job. When I said `farm011 - farm018`, which actually means `farm011, farm012, farm013, farm014,.....` and the so on. I've added an example of the list I'm trying to make. Does it answer your questions? – MacUsers Apr 10 '11 at 09:01
  • Sorry, I still don't see the origin of the error. Sometimes, a syntax error is indicated for a line but the real reason is in the previous lines – eyquem Apr 10 '11 at 23:27

4 Answers4

1
for rng in ncitems.split(','):
  l = re.findall("(\w+\d+)", rng)
  if len(l) == 1:
    items.extend(l)
  elif len(l) == 2:
    w1,s,w2,e = re.findall("(\w+)(\d+)", rng) # w1 and w2 should be same...
    for i in range(s,e):
      items.append("%s%03d"%(w1,i))
vartec
  • 131,205
  • 36
  • 218
  • 244
  • Unfortunately, your code doesn't work as is. Must write ``(w1,s),(w2,e)`` instead of ``w1,s,w2,e`` and ``re.findall("(\w)(\d+)", rng)`` catches **('m', '011')** and **('m', '018')** . – eyquem Apr 10 '11 at 17:25
  • Now I get: ``for i in range(s,e): TypeError: range() integer end argument expected, got str.`` And with ``for i in range(int(s),int(e)):`` the result is ``['farm01001', 'farm01002', 'farm01003', 'farm01004', 'farm01005', 'farm01006', 'farm01007', 'farm020', 'farm02002', 'farm04001']`` – eyquem Apr 10 '11 at 21:09
1

This is a simple solution:

#!/usr/bin/python
import re

inp = "farm011 - farm018, farm020, farm022 - farm033, farm041 - farm052"
range_re = re.compile("farm(\d+) - farm(\d+)")

items = [i.strip() for i in inp.split(",")]
op_list = []
for i in items:
    result = range_re.match(i)
    if result:
        start = int(result.group(1), 10)
        end = int(result.group(2), 10)
        for j in range(start, end + 1):
            op_list.append("farm%03d" % j)
    else:
        op_list.append(i)

print op_list
Rumple Stiltskin
  • 9,597
  • 1
  • 20
  • 25
  • @Rumple: Your code worked just fine. That's the result exactly I was looking for. Thank you. Cheers!! – MacUsers Apr 10 '11 at 19:15
  • @Rumple Stiltskin See in the edit of my answer how your code can be shortened, avoiding to create an object **items** and reducing number of lines – eyquem Apr 10 '11 at 21:41
  • @eyquem: I'm probably doing something wrong; the modified Rumple's code also giving me exactly the same error on the 2nd `for` in this line: `for result in (range_re.match(i.strip()) for i in inp.split(",")):` Any idea why? Cheers!! – MacUsers Apr 10 '11 at 23:19
  • @eyquem: Yes, it can be shortened. But, I think, this is more readable. Agree? – Rumple Stiltskin Apr 11 '11 at 04:47
  • @Rumple Stiltskin I don't find ``items = [i.strip() for i in inp.split(",")]`` particularly readable. Defining **start** and **end** objects in two instructions instead of only one is a little heavy too. In fact I shouldn't have proposed a condensation of your code, because in my opinion it isn't a good one, sorry. See the second EDIT in my answer. – eyquem Apr 11 '11 at 11:03
0

Based on the link on the WZeberaFFS answer, modified for including numbers:

>>> import re
>>> s="farm011 - farm018, farm020, farm022 - farm033, farm041 - farm052"
>>> re.findall("[\w\d]+",s) #find the words instead of splitting them
['farm011', 'farm018', 'farm020', 'farm022', 'farm033', 'farm041', 'farm052']
>>> re.split(" *[-,] *",s) #another approach, using re.split
['farm011', 'farm018', 'farm020', 'farm022', 'farm033', 'farm041', 'farm052']
utdemir
  • 26,532
  • 10
  • 62
  • 81
  • that's not actually what I asked for. I rather wanted the list like `['farm011','farm012','farm013',....'farm018', 'farm020', 'farm022','farm023',.....` and so on. `farm011 - farm018` means all the item from farm011 to farm018 *not* literally `farm011 - farm018`. – MacUsers Apr 10 '11 at 18:17
0

I wanted to correct the vartec's solution.

And then, from one correction to another, I finally modified the algorithm, and obtained:

# first code
import re

ncitems = 'farm011 - farm018, farm020, farm022 - farm033, farm041 - farm052'
print 'ncitems :\n',ncitems,'\n\n'

items = []

pat = re.compile("(\w+)(?<!\d)(\d+)(?:[ -]+(\w+)(?<!\d)(\d+))* *(?:,|\Z)")

for w1,s,w2,e in pat.findall(ncitems):
    print '(w1,s,w2,e)==',(w1,s,w2,e)
    items.extend( ("%s%03d"%(w1,i) for i in range(int(s),int(e)+1))
                  if w2
                  else ("%s%s"%(w1,s),) )

print '\nitems :\n',items

result

ncitems :
farm011 - farm018, farm020, farm022 - farm033, farm041 - farm052 


(w1,s,w2,e)== ('farm', '011', 'farm', '018')
(w1,s,w2,e)== ('farm', '020', None, None)
(w1,s,w2,e)== ('farm', '022', 'farm', '033')
(w1,s,w2,e)== ('farm', '041', 'farm', '052')

items :
['farm011', 'farm012', 'farm013', 'farm014', 'farm015', 'farm016', 'farm017', 'farm018', 'farm020', 'farm022', 'farm023', 'farm024', 'farm025', 'farm026', 'farm027', 'farm028', 'farm029', 'farm030', 'farm031', 'farm032', 'farm033', 'farm041', 'farm042', 'farm043', 'farm044', 'farm045', 'farm046', 'farm047', 'farm048', 'farm049', 'farm050', 'farm051', 'farm052']

.

With itertools.chain() :

# second code
from itertools import chain
import re

ncitems = 'farm011 - farm018, farm020, farm022 - farm033, farm041 - farm052'
print 'ncitems :\n',ncitems,'\n\n'

pat = re.compile("(\w+)(?<!\d)(\d+)(?:[ -]+(\w+)(?<!\d)(\d+))* *(?:,|\Z)")

gen = ( ("%s%03d"%(w1,i) for i in range(int(s),int(e)+1)) if w2
        else ("%s%s"%(w1,s),)
        for w1,s,w2,e in pat.findall(ncitems) )

items = list(chain(*gen))

print 'items :\n',items

.

Note that if elements are like this one : far24idi2rm011 , all these codes still run correctly.

.

EDIT

I would write the Rumple Stiltskin's code as follows:

import re

inp = "farm011 - farm018, farm020, farm022 - farm033, farm041 - farm052"
range_re = re.compile("farm(\d+) - farm(\d+)")

op_list = []
for result in (range_re.match(i.strip()) for i in inp.split(",")):
    if result:
        start,end = map(int,result.groups())
        for j in range(start, end + 1):
            op_list.append("farm%03d" % j)
    else:
        op_list.append(i)

print op_list

.

EDIT 2

In fact, I wouldn't write the Rumple Stiltskin's code. My opinion is that it's a bad way to do: first a split(",") , then a search with a regex. An appropriate regex can match directly what is needed, so why going through dilatory instructions ?

If readability is the aim, and it's a good aim according to me, I think this code is the simplest and more readable:

import re

ncitems = 'farm011 - farm018, farm020, farm022 - farm033, farm041 - farm052'
print 'ncitems :\n', ncitems

pat = re.compile("(\w+)(?<!\d)(\d+)(?:[ -]+(\w+)(?<!\d)(\d+))* *(?:,|\Z)")

items = []
for w1,s,w2,e in pat.findall(ncitems):
    if w2:
        items.extend("%s%03d"%(w1,i) for i in xrange(int(s),int(e)+1))
    else:
        items.append("%s%s"%(w1,s))

print '\nitems :\n',items
eyquem
  • 26,771
  • 7
  • 38
  • 46
  • Looks like there is a typo in your 3rd code. I'm getting `SyntaxError: invalid syntax` where the first `for` is in the `items` line. Cheers!! – MacUsers Apr 10 '11 at 19:05
  • @MacUsers I don't understand the origin of the SyntaxError you observe. I verified the third code and it runs correctly on my machine. I also verified that chain() exists in Python 2.3 – eyquem Apr 10 '11 at 19:45
  • chain() is not the problem; the problem is with for loop. I've added the error message in my original post. @Rumple's code that you modified, giving me the similar error as well. Not sure if I'm doing anything wrong. Cheers!! – MacUsers Apr 10 '11 at 23:24