-1

\W detects following non-word characters

\\  Backslash (\)    
\'  Single quote (')     
\"  Double quote (")     
\a  ASCII Bell (BEL)     
\b  ASCII Backspace (BS)     
\f  ASCII Formfeed (FF)  
\n  ASCII Linefeed (LF)  
\r  ASCII Carriage Return (CR)   
\t  ASCII Horizontal Tab (TAB)   
\v  ASCII Vertical Tab (VT)  
\ooo    Character with octal value ooo  
\xhh    Character with hex value hh 
\newline    Backslash and newline ignored    

Below are two lines, first line starting with #(is a pure comment), second line is multi-line string with intermittent comments

# abc                                                    # def
1.3.6.1.4.1.555.2.12.6.102                 0x04444001    1.3.6.1.4.1.75.2.12.90.901(1,0)\
                                                         # xyz
                                                         1.3.6.1.4.1.75.2.12.90.902(2,0)\
                                                         # ddd
                                                         1.3.6.1.4.1.75.2.12.90.903(3,0)

Some of the above lines have \ as the last non-word character.

Goal is to construct the above syntax to a single string: 1.3.6.1.4.1.555.2.12.6.102 0x04444001 1.3.6.1.4.1.75.2.12.90.901(1,0) 1.3.6.1.4.1.75.2.12.90.902(2,0) 1.3.6.1.4.1.75.2.12.90.903(3,0)


How to detect backslash \ on end of every line? Because...

print(re.search(r'\\', 'hello\there'))      # '\\' in r'hello\there' gives None - Because backslash is interpreted as part of Esc seq
print(re.search(r'\\', r'hello\there'))     # '\\' in r'hello\there' gives (5,6) - Because raw string interprets backslash as backslash
print(re.search(r'\\$', 'hellothere\'))     # \' & \" is also an escape sequence. So, python could not find end of string literal
print(re.search(r'\\', r'hellothere\'))     # python should consider backslash as backslash, but, python could not find end of string literal. No clue..
overexchange
  • 15,768
  • 30
  • 152
  • 347

1 Answers1

1

To get the desired output:

  1. Read the file line by line.
  2. Remove the last character if it is '\'.
  3. Join the modified lines.

The above operations should provide the required result. I think using regex would just complicate the solution without any added benefits.

Quoting the doc on lexical analysis:

When an 'r' or 'R' prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string. For example, the string literal r"\n" consists of two characters: a backslash and a lowercase 'n'. String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string, not as a line continuation.

gaganso
  • 2,914
  • 2
  • 26
  • 43
  • But question still remains... `print(re.search(r'\\$', 'hellothere\'))` should not work but `print(re.search(r'\$', r'hellothere\'))` should work with raw string... because backslash is just a backslash but not part of escape sequence – overexchange Jul 13 '18 at 18:33
  • read https://stackoverflow.com/questions/647769/why-cant-pythons-raw-string-literals-end-with-a-single-backslash – i alarmed alien Jul 13 '18 at 18:41
  • @ialarmedalien Your referred answer says: *Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character)*. This breaks the purpose of using rawstrings... because [here](http://www.interfaceware.com/manual/python_escape_sequences_raw_strings.html) it says: *If you want to create a string that contains backslashes, and you do not want Python to try to interpret these backslashes as escape sequences, you can create a raw string.* – overexchange Jul 13 '18 at 18:49
  • 2
    from the [python docs](https://docs.python.org/3/reference/lexical_analysis.html#string-and-byte-literals): Even in a raw literal, quotes can be escaped with a backslash, but the backslash remains in the result; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw literal cannot end in a single backslash (since the backslash would escape the following quote character). – i alarmed alien Jul 13 '18 at 19:02
  • 1
    Okay, I just added the same content to the answer and saw the comment now. Thanks, @ialarmedalien. – gaganso Jul 13 '18 at 19:13
  • @overexchange, print(re.search(r'\$', r'hellothere\')) this doesn't work since the way python parses is as provided in the doc. – gaganso Jul 13 '18 at 19:14
  • @ialarmedalien Meaning of a *raw string cannot end in an odd number of backslashes* - `r'\'` vs `r'\$'` – overexchange Jul 17 '18 at 14:17
  • @overexchange `\$` is a dollar sign, `$` is the end of a string. "cannot end in an odd number of backslashes" means the number of backslashes must be divisible by two. – i alarmed alien Jul 19 '18 at 10:00