-1

I just need to delete the email disclaimer from the following text. Seems like an easy task but I'm not getting there.

string:

Hello. Please see attached. Thanks. CONFIDENTIAL AND PROPRIETARY * The content of this email, including all header and footer information, is the “CONFIDENTIAL AND PROPRIETARY INFORMATION” of The Cleaning house, LLC. (“Franchisor”) and is protected under the applicable franchise agreement (“applicable Franchise Agreement”) between Franchisor and each of its franchisees (“Franchisee(s)”). Accordingly, each Franchisee who receives this email must, both during and after the term of the applicable Franchise Agreement, maintain the absolute confidentiality of the content of this email and may disclose the content of this email only to its employees and agents and only to the extent necessary for the operation of its Franchised Business (as defined in the applicable Franchise Agreement) in accordance with the applicable Franchise Agreement. None of the Franchisees who receive this email may use (or permit any other natural or legal person to use) the content of this email in any other business or in any way not authorized by Franchisor in writing. © 2009 The Cleaning house, LLC. All rights reserved.{}

Desired output:

Hello. Please see attached. Thanks. .{}

My attempts:

CONFIDENTIAL AND PROPRIETARY[\n|.]*(?=reserved)

CONFIDENTIAL AND PROPRIETARY.*All rights reserved

edit: There may be newlines and all types of weird stuff in the string also. I was hoping the .* would handle this.

wolf7687
  • 135
  • 8
  • 1
    Your second regex [works](https://regex101.com/r/6AwIWc/1). Has your text got line breaks between the sentences? Then see [How do I match any character across multiple lines in a regular expression?](https://stackoverflow.com/a/45981809/3832970) – Wiktor Stribiżew Apr 20 '20 at 21:41
  • Are you parsing HTML or plain text? – Robo Robok Apr 20 '20 at 21:58

1 Answers1

-1

Probably you aren't usin in the right way the regex parser.

import re

email = "Hello. Please see attached. Thanks. CONFIDENTIAL AND PROPRIETARY * The content of this email, including all header and footer information, is the CONFIDENTIAL AND PROPRIETARY INFORMATION of The Cleaning house, LLC. (Franchisor) and is protected under the applicable franchise agreement (applicable Franchise Agreement) between Franchisor and each of its franchisees (Franchisee(s)). Accordingly, each Franchisee who receives this email must, both during and after the term of the applicable Franchise Agreement, maintain the absolute confidentiality of the content of this email and may disclose the content of this email only to its employees and agents and only to the extent necessary for the operation of its Franchised Business (as defined in the applicable Franchise Agreement) in accordance with the applicable Franchise Agreement. None of the Franchisees who receive this email may use (or permit any other natural or legal person to use) the content of this email in any other business or in any way not authorized by Franchisor in writing. © 2009 The Cleaning house, LLC. All rights reserved.{}"

cleaned_email = re.sub(r'CONFIDENTIAL AND PROPRIETARY[\s\S]*All rights reserved', '', email)

Look te change of .* for [\s\S]*

(dot) . Matches any character except a newline, but \s Matches Unicode whitespace characters, and \S Matches any character which is not a whitespace character

Lucas Vazquez
  • 1,456
  • 16
  • 20