0

I'm trying to write a function that delete unwanted paths from a list of paths, those unwanted ones have the same pattern for example c:/project1/main/Node/Accept/*something this is my code

def filtre(list):
    for i,item in enumerate(list):
        matchObject=re.search(r"(Accept/.*[/])", item) or re.search(r"(Integrate/.*[/])", item)
        if matchObject :
            list.remove(item)   
        else:
            i=i+1
    return list

and this an example of my global list:(Input)

c:/project1/main/Node/Accept/testCase1/Browse.c
c:/project1/main/Node/Accept/testCase2/navigate.c
c:/project1/main/Node/Accept/testCase2/save.c
c:/project1/main/Node/Accept/testCase4/search.c
c:/project1/main/Node/Accept
c:/project1/main/Node/Integrate
c:/project1/main/Node/Accept/destroy/fullCoverage/remove.py
c:/project1/main/Tree/Integrate/testCase1/Browse.c
c:/project1/main/Tree/Integrate

desired output:

c:/project1/main/Node/Accept
c:/project1/main/Node/Integrate
c:/project1/main/Tree/Integrate

I hope that is clear ,remove every path followed by more child nodes after the nodes Accept and Integrate.

karthik manchala
  • 13,492
  • 1
  • 31
  • 55
MgMh
  • 3
  • 4
  • Why do you have a capture group in your regexp if you're not doing a replacement? And there's no need to put `[ ]` around `/`. – Barmar May 06 '15 at 21:09
  • 1
    Please do not call local variables `list`. You mask the builtin `list()` constructor. – dawg May 06 '15 at 21:48

3 Answers3

0

You can use the following regex to match the desired output:

(^.*?\/(?:Accept|Integrate))$

If you want to remove the unnecessary paths.. you can use the following:

(^.*?\/(?:Accept|Integrate))(?!$).*

Python code:

def filtre(list):
    for i,item in enumerate(list):
        matchObject=re.search(r"(^.*?\/(?:Accept|Integrate))(?!$).*", item)
        if matchObject :
            list.remove(item)   
        else:
            i=i+1
    return list

See DEMO

karthik manchala
  • 13,492
  • 1
  • 31
  • 55
0

Instead of matching for what you don't want, you can match what you want, with: ^.*/(Accept|Integrate)$

Now there is another problem with your loop: you can't remove items from a list you're iterating on! You need to work on a temporary result list.

Since you seem to want to filter the list "in place" you can use this: list[:] = tmp

Here's the code:

import re

def filtre(list):
    tmp = []
    for item in list:
        # matchObject=re.search(r"Accept/.*/]", item) or re.search(r"(Integrate/.*/])", item)
        if re.match(r"^.*/(Accept|Integrate)$", item):
            tmp.append(item)
    list[:] = tmp


input = ["c:/project1/main/Node/Accept/testCase1/Browse.c",
"c:/project1/main/Node/Accept/testCase2/navigate.c",
"c:/project1/main/Node/Accept/testCase2/save.c",
"c:/project1/main/Node/Accept/testCase4/search.c",
"c:/project1/main/Node/Accept",
"c:/project1/main/Node/Integrate",
"c:/project1/main/Node/Accept/destroy/fullCoverage/remove.py",
"c:/project1/main/Tree/Integrate/testCase1/Browse.c",
"c:/project1/main/Tree/Integrate"
]

filtre(input)
print '\n'.join(input)

Result:

c:/project1/main/Node/Accept
c:/project1/main/Node/Integrate
c:/project1/main/Tree/Integrate
GCord
  • 146
  • 6
  • good idea working with a temp list but deleting entries from the same list decrease some processor treatments because the drive I'm working on is a server replicate a size of 430Gb of nodes – MgMh May 07 '15 at 20:41
  • But you just can't delete entries while iterating on your list, it can cause undefined behavior... on my laptop the solution you've accepted gives the wrong result because the loop skips the next item after you've removed one: `c:/project1/main/Node/Accept/testCase2/navigate.c c:/project1/main/Node/Accept/testCase4/search.c c:/project1/main/Node/Accept c:/project1/main/Node/Integrate c:/project1/main/Tree/Integrate/testCase1/Browse.c c:/project1/main/Tree/Integrate` See this post... http://stackoverflow.com/questions/6022764/python-removing-list-element-while-iterating-over-list – GCord May 08 '15 at 13:01
  • you were right i did a debug ,it seems that it skips some lines but i get the expected result by chance :) I'll try yours and give a feedback thanks – MgMh May 08 '15 at 20:44
0

You can use this regex:

^\S+?(?:Accept|Integrate)\s*$

Demo

In Python:

txt='''\
c:/project1/main/Node/Accept/testCase1/Browse.c
c:/project1/main/Node/Accept/testCase2/navigate.c
c:/project1/main/Node/Accept/testCase2/save.c
c:/project1/main/Node/Accept/testCase4/search.c
c:/project1/main/Node/Accept
c:/project1/main/Node/Integrate
c:/project1/main/Node/Accept/destroy/fullCoverage/remove.py
c:/project1/main/Tree/Integrate/testCase1/Browse.c
c:/project1/main/Tree/Integrate'''

>>> re.findall(r'^\S+?(?:Accept|Integrate)\s*$', txt, re.M)
['c:/project1/main/Node/Accept', 'c:/project1/main/Node/Integrate', 'c:/project1/main/Tree/Integrate']

If your source is a list of strings rather than a single string, use filter with the same regex:

>>> filter(lambda s: re.search(r'^\S+?(?:Accept|Integrate)\s*$', s), txt.splitlines())
['c:/project1/main/Node/Accept', 'c:/project1/main/Node/Integrate', 'c:/project1/main/Tree/Integrate']
dawg
  • 98,345
  • 23
  • 131
  • 206