0

So I have parsed a list from an HTML email, called br_list:

br_list: [<b>Sent:</b>, <b>To:</b>, <b>Subject:</b>, 'NEFS VII & VIII Manager', 'E-mail: ', 'Office:(508)984-0900 ', 'Cell:(508)965-0064']

And I have a list of sectors, sectors:

sectors = (
    'Fixed Gear Sector',
    'Maine Coast Community Sector',
    'Maine Permit Bank',
    'NCCS',
    'NEFS 2',
    'NEFS 3',
    'NEFS 4',
    'NEFS 5',
    'NEFS 6',
    'NEFS 7',
    'NEFS VII',
    'NEFS 8',
    'NEFS VIII',
    'NEFS 9',
    'NEFS 10',
    'NEFS X',
    'NEFS 11',
    'NEFS 12',
    'NEFS 13',
    'New Hampshire Permit Bank',
    'Port Clyde Community Groundfish Sector',
    'Sustainable Harvest Sector 1',
    'Sustainable Harvest Sector 2',
    'Sustainable Harvest Sector 3',
    'Tri-State Sector',
    )

And I would like to see if br_list contains any entries from sectors in it. It should be as easy as

if any(i in br_list for i in sectors):
print("yup")

....but nothing gets printed. I assume it fails because it's looking for a single list entry that is a sector, which doesn't exist, even though a sector clearly does exist within one of the list entries.

So:

1) Is there a way to check if any of those sectors exists anywhere in br_list?

2) If a sector does exist in br_list, is there a way to capture just that sector string? In this case, "NEFS VII" ?

** EDIT: ** As was pointed out, my code failed because NEFS VII is a substring of a list entry, not a list entry itself. I solved it with the accepted answer below.

theprowler
  • 3,138
  • 11
  • 28
  • 39
  • `matches = [sector for sector in sectors if sector in br_list]` – Jared Smith Aug 09 '17 at 14:10
  • 2
    Possible duplicate of [Common elements comparison between 2 lists](https://stackoverflow.com/questions/2864842/common-elements-comparison-between-2-lists) – Igl3 Aug 09 '17 at 14:10
  • nothing matches though... it is normal that nothing gets printed. `'NEFS VII'` exists as a substing in `br_list` not as a whole.. – Ma0 Aug 09 '17 at 14:11
  • @Igle This is not what OP is trying to do. See point 2 of his. – Ma0 Aug 09 '17 at 14:14
  • @JaredSmith that only prints out `matches: []`. I believe that's because `NEFS VII` isn't its own single list entry. But am I wrong? – theprowler Aug 09 '17 at 14:15
  • @Igle I tried that link's answer and it only printed out an empty list. I believe it's due to the same reason as @JaredSmith's answer; because `NEFS VII` is within a list entry instead of its own entry. Is there a way around that? Is RegEx the right approach? – theprowler Aug 09 '17 at 14:16
  • @Ev.Kounis Ohhhh. Is there a way to scan thru substrings as well as the list entries themselves? Should I be using RegEx? – theprowler Aug 09 '17 at 14:16
  • @theprowler next time chop it down to a **minimum complete** example of the problem with example inputs and desired output: I (like most others apparently) gave you an answer to the question you asked, not the question you had. – Jared Smith Aug 09 '17 at 14:22
  • @JaredSmith right, I usually am pretty concise, but I was unsure of the wording for this problem. I tried saying that I thought it failed because `NEFS VII` wasn't a list entry.....turns out I needed to search for substrings within `br_list`, I wasn't aware that's how it should've been worded. – theprowler Aug 09 '17 at 14:24
  • 1
    @theprowler it was your sample inputs that were too long and cluttered, not your phrasing of the question. If you'd said "I have `sectors = ("foo", "bar")` and `inputs = ["the foo is all fooed", "my bar is barred"]` and I need to extract any matches from `sectors` found in `inputs` like `["foo", "bar"]` we'd have figured it out. Giving us a bunch of html tags that scroll off the side of the screen and a tuple of thirty-odd possible matches obscured what you wanted. – Jared Smith Aug 09 '17 at 14:26
  • @JaredSmith Trueeeee. Gotcha. I never think to change it up like that and simplify it that much. But that is clearly much more concise and easier to read and answer. I appreciate the feedback. – theprowler Aug 09 '17 at 14:30
  • @theprowler no problem. As an added bonus, stripping it down to a bare minimum often provides the solution: I wish I had a nickel for every time I answered my own question in the course of writing it. – Jared Smith Aug 09 '17 at 14:33

2 Answers2

3

This is probably what you want, although your phrasing of the question threw many people (me included) off.. You want to check for substrings I assume..

br_list = ['NEFS VII & VIII Manager', 'E-mail: ', 'Office:(508)984-0900 ', 'Cell:(508)965-0064']
sectors = (
    'Fixed Gear Sector',
    'Maine Coast Community Sector',
    'Maine Permit Bank',
    'NCCS',
    'NEFS 2',
    'NEFS 3',
    'NEFS 4',
    'NEFS 5',
    'NEFS 6',
    'NEFS 7',
    'NEFS VII',
    'NEFS 8',
    'NEFS VIII',
    'NEFS 9',
    'NEFS 10',
    'NEFS X',
    'NEFS 11',
    'NEFS 12',
    'NEFS 13',
    'New Hampshire Permit Bank',
    'Port Clyde Community Groundfish Sector',
    'Sustainable Harvest Sector 1',
    'Sustainable Harvest Sector 2',
    'Sustainable Harvest Sector 3',
    'Tri-State Sector',
    )

finds = []
for check in sectors:
    if any(check in item for item in br_list):
        finds.append(check)
print(finds)  # ['NEFS VII']

or

finds = []
for string in br_list:
    finds.extend([x for x in sectors if x in string])
print(finds)

Depending on which list of the two is bigger, the efficiency of the two proposed methods may vary.

Ma0
  • 15,057
  • 4
  • 35
  • 65
  • Ohhhhh ok ok my bad for not wording the question correctly. I see now why my approach was failing. This worked perfectly btw, and thanks for explaining it was the substrings I needed to search. – theprowler Aug 09 '17 at 14:21
0

First of all, your sectors is not a list but a tuple. Your br_list contains invalid elements (e.g. <b>Sent:</b> should probably be put in quotes)

As for your second question you could just do a nested list comprehension:

found_sectors = [sector for entry in br_list for sector in sectors if sector in entry]

Which gives the same result as:

found_sectors = []
for entry in br_list:
    for sector in sectors:
        if sector in entry:
            found_sectors.append(sector)
Igl3
  • 4,900
  • 5
  • 35
  • 69