0

I have to remove a dictionary of phrase from a list of string using Python

A list of strings L1. Example: L1 = ['Programmer New York', 'Programmer San Francisco']

A dictionary of phrase L2 (all of them are more than one word). Example: L2={'New York', 'San Francisco'}

The expected output is, for each string in L1, remove substring that exists in L2. So the output will be res=['Programmer', 'Programmer'].

def foo(L1, L2):
    res = []
    print len(L1)
    for i in L1:
        for j in L2:
            if j in i:
                i = i.replace(j, "")
        res.append(i)
    return res

My current program is a brute force double for loop. But is it possible to improve the performance? Especially when L1 size is very large.

xuanyue
  • 1,368
  • 1
  • 17
  • 36
  • 4
    I'm voting to close this question as off-topic if your code works, and you want review/optimizations, it belongs on codereview.stackexchange.com – Two-Bit Alchemist Aug 08 '16 at 02:08
  • Your shouldn't need to iterate over L2 at all. – OneCricketeer Aug 08 '16 at 02:11
  • 2
    @Two-BitAlchemist This looks like example code and thus would be off-topic on Code Review. The real working code would be fine on there, though. – Phrancis Aug 08 '16 at 02:17
  • @Two-BitAlchemist I couldn't agree with you on this. My goal is get help from stackoverflow to find a better algorithm or better way to improve the performance. You can't criticize me for provide a runnable example code, I believe there is no better way to let other people know what you want than show the code. Also see http://stackoverflow.com/questions/19560498/faster-way-to-remove-stop-words-in-python, why this one have 15 vote up and no one ask to close? Mine is the same problem with just a slightly complicated input. – xuanyue Aug 08 '16 at 19:54
  • @cricket_007. Yeah, I believe so, that's why I ask the question. There are too many if check that doesn't need to run. Do you have some better data structure or algorithm to get rid of iterate L2? – xuanyue Aug 08 '16 at 19:59
  • I'm probably in the minority here (maybe not because this has 4 close votes) but I don't think "My code works, but can you rewrite my whole program to be better?" questions are on-topic here. I may have been mistaken when I referred you to code review. I don't see what the other question you linked has to do with anything. – Two-Bit Alchemist Aug 08 '16 at 20:57
  • @Two-BitAlchemist We should follow the guideline of stackoverflow not just what you think. I believe my question followed the stackoverflow's guideline as "practical, answerable questions based on actual problems that you face in software development." I have seen so many high voted problem asked in python "How to write ** in a pythonic way". "Best way to write*." Asking question to improve code is definitely on topic here. – xuanyue Aug 08 '16 at 21:30
  • 3
    @xuanyue I don't know why you think this is "just my opinion" (or why you don't think what you're saying is just yours), but here is a very relevant meta question: http://meta.stackoverflow.com/questions/277565/code-review-improvements-my-code-works-but-i-want-to-ask-if-there-is-a-better --> highly upvoted comment immediately pointing at CR. And from the linked dupe answer (emphasis mine): "**CodeReview: Your code works but you'd love to hear how it could work better**" – Two-Bit Alchemist Aug 09 '16 at 17:39
  • 2
    Also I'm not going to chase down Meta links right now but the rules have changed over the years as the site has evolved, so citing (or implying that there are) highly upvoted questions similar to yours that are not closed is showing a failure of moderation, not showing that the rules are different than they are. – Two-Bit Alchemist Aug 09 '16 at 17:40
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/120539/discussion-between-xuanyue-and-two-bit-alchemist). – xuanyue Aug 09 '16 at 17:47
  • 1
    @Two-BitAlchemist There's one major problem with you pointing to Code Review though. The function name is `foo`. I suspect this is a MCVE, those are [off-topic at Code Review](http://codereview.stackexchange.com/help/on-topic). If the actual code is provided it could work though. – Mast Aug 09 '16 at 17:52
  • @Mast I kind of already acknowledged that in response to Phrancis's comment. My main point is I don't think these types of questions are on-topic at StackOverflow. – Two-Bit Alchemist Aug 09 '16 at 17:54
  • 1
    And you may be right @Two-BitAlchemist, but citing "belongs on CR" as a close reason isn't valid. If it's not on topic here on SO, then VTC for one of those reasons. – RubberDuck Aug 09 '16 at 17:59
  • @RubberDuck I had already cast my close vote _before_ that discussion took place. I can't do anything about that now except learn for future questions. I could retract my close vote but that doesn't (1.) allow me to vote again or (2.) do anything about the 3 people who also VTC for that reason. – Two-Bit Alchemist Aug 09 '16 at 18:00
  • I'm sorry about arouse this controversial situation. Anyway, I didn't ask a good question. I tried to delete but not able to do so. I will be more careful next ask this kind of question. – xuanyue Aug 09 '16 at 18:04

2 Answers2

1

Try using map() and re,

import re
res = map(lambda i, j: re.sub(" "+i, '', j), L2, L1)

The double quotes before the i are there to eliminate the trailing space after programmer.

return list(res)

P.S. returning a list explicitly is only necessary if you are using Python 3. Let me know if this improves your speed at all.

Dan Temkin
  • 1,565
  • 1
  • 14
  • 18
0

You can use list comprehension to do so as:

l1 = ['Programmer New York', 'Programmer San Francisco']
l2=['New York', 'San Francisco']
a=[x.split(y) for x in l1 for y in l2 if y in x]
res=["".join(x) for x in a]
shiva
  • 2,535
  • 2
  • 18
  • 32
  • Yeah, To me this is just a more clean way.. But the complexity is still the same.. May be I should stress I'm looking for a better algorithm.. Also I doubt use split will have better performance than replace. – xuanyue Aug 08 '16 at 19:56