2

The objective is to create a list comprehension that outputs two values.

The for loops look like below

    paper_href_scopus = []
    paper_title = []
    for litag in all_td.find_all('a', {'class': 'ddmDocTitle'}):
        paper_href_scopus.append(litag['href'])
        paper_title.append(litag.text)

As suggested by OP, this can be achieved by

    paper_href_scopus, paper_title = zip(*[(litag['href'], litag.text) for litag in all_td.find_all('a', {'class': 'ddmDocTitle'})])

However, there is an instances where the all_td.find_all('a', {'class': 'ddmDocTitle'}) returns empty and the compiler returns an error:

ValueError: not enough values to unpack (expected 2, got 0)

Based on the discussion in this thread, it seems the above code can be modified as

     paper_href_scopus, paper_title = zip(
                        *((litag['href'], litag.text) for litag in all_td.find_all('a', {'class': 'ddmDocTitle'}) \
                          if all_td.find_all('a', {'class': 'ddmDocTitle'}
                          ))

But still, the compiler returns an error

ValueError: not enough values to unpack (expected 2, got 0)

Nevertheless, the following code works despite the fact that on some occasions the all_td.find_all('a', {'class': 'ddmDocTitle'}) returns empty

    [(paper_href_scopus.append(litag['href']), paper_title.append(litag.text)) \
                     for litag in all_td.find_all('a', {'class': 'ddmDocTitle'})]

But, I would like to avoid using append as there is requirement to initialize paper_href_scopus=[] and paper_title=[] beforehand.

May I know, what can I do to fix the code?

    paper_href_scopus, paper_title = zip(
                        *((litag['href'], litag.text) for litag in all_td.find_all('a', {'class': 'ddmDocTitle'}) \
                          if all_td.find_all('a', {'class': 'ddmDocTitle'}
                          ))
a_jelly_fish
  • 478
  • 6
  • 21
mpx
  • 3,081
  • 2
  • 26
  • 56
  • Can you add minimal code to reproduce? – Greg Aug 01 '20 at 10:43
  • I would suggest, you assign this to a name like `a_tds = all_td.find_all('a', {'class': 'ddmDocTitle'})` then pass that to `zip` only if it's not empty. – han solo Aug 01 '20 at 10:47
  • Thanks @han, I thinks this will lead to extra line of code, but I think the number of execution is still the same either ways. I believe you are proposing something like `a_tds = all_td.find_all('a', {'class': 'ddmDocTitle'}) if a_tds: paper_href_scopus, paper_title = zip( *[(litag['href'], litag.text) for litag in a_tds])` – mpx Aug 01 '20 at 10:57
  • 1
    Yes. Don't worry about lines. "Lines are free, clarity is priceless" and in this case required :) – han solo Aug 01 '20 at 11:13

1 Answers1

1

Firstly, the version with the if is basically equivalent to:

tmp = []
for litag in all_td.find_all('a', {'class': 'ddmDocTitle'}):
    if all_td.find_all('a', {'class': 'ddmDocTitle'}):
        tmp.append((litag['href'], litag.text))

paper_href_scopus, paper_title = zip(*tmp)

which, if your document has 100 matching elements, it does 101 searches.

Here's my proposal: forget about the zip. Instead, split out the code a bit:

litags = all_td.find_all('a', {'class': 'ddmDocTitle'})
paper_href_scopus = [litag['href'] for litag in litags]
paper_title = [litag.text for litag in litags]
Jasmijn
  • 9,370
  • 2
  • 29
  • 43
  • While this suggestion improve overall readability, but I would like to keep the number of lines minimal as possible. But somehow, this suggestion omit the use of 'zip' which I think may improved the overall execution speed maybe. But I think this approach will required further tweak to handle if 'litgas' returns empty. – mpx Aug 01 '20 at 11:20
  • Have you tried it? Because my suggestion should handle the empty case just fine. (And I don't know why you would want to reduce the number of lines to the extent that your single logical line has to span multiple physical lines. – Jasmijn Aug 01 '20 at 12:40