0

I have a thousand + files with remarks in HTML format. Some of them have spaces at the front, some have extra spaces inbetween words and there is a specific remark that is often found that I want to exclude.

I have created a function to strip the html tags (strip_tags()). This accomplishes what I want:

stripped_remarks = [" ".join(strip_tags(rem).split()) for rem  in  remarks]  #removes  extra spaces and  html tags
stripped_remarks = [rem for rem in  remarks if rem  != r'garbage text ***']  #removes the garbage remark from  the list

I can make this one line by changing the "if rem" part so it strips the spaces and html tags like it does before "for", but that seems to do the work twice when it's not necessary. Is it possible to do something like this?

stripped_remarks = [" ".join(strip_tags(rem).split()) as strip_rem for rem in remarks if split_rem != r'garbage text ***']

By defining strip_rem within the comprehension and reusing it for my conditional, I could easily make this one line without stripping the extra spaces or html tags twice. But is it possible?

MattDMo
  • 100,794
  • 21
  • 241
  • 231
  • 1
    From Python 3.8 you can use the "walrus" operator: `stripped_remarks = [strip_rem for rem in remarks if (strip_rem := " ".join(strip_tags(rem).split())) != r'garbage text ***']` – Swifty Dec 08 '22 at 17:51
  • why not move the splitting and joining into strip_tags and make it a generator then do `[rem for rem in strip_tags(remarks) if rem != r'garbage text ***']` – Kurt Dec 08 '22 at 17:55

1 Answers1

0

Using the 'walrus operator' introduced in Python 3.8, this should work:

stripped_remarks = [strip_rem for rem in remarks if (strip_rem := " ".join(strip_tags(rem).split())) != r'garbage text ***']
Swifty
  • 2,630
  • 2
  • 3
  • 21
  • Yep! This does the trick! I also tried a nested comprehension, which worked and might solve this issue for previous versions of python, but the walrus operator is much cleaner! Thank you! ```stripped_remarks = [stripped_rem for stripped_rem in [" ".join(strip_tags(rem).split()) for rem in remarks] if stripped_rem != r'garbage text ***']``` – PVT Compyle Dec 08 '22 at 18:13