0

I want to use re.sub to replace a part of a string I know exactly what looks like. relevant part of code:

print "Regex statement: ", foundStatements[iterator]
print "string to replace with : \n", latexPreparedString
print "string to search&replace in: \n", fileAsString
processedString = re.sub(foundStatements[iterator], latexPreparedString, fileAsString)
print "processed string: \n", processedString

In my testing case, foundStatements[iterator] is "%@import script_example.py ( *out =(.|\n)*?return out)" But even though processedString contains foundStatements[iterator], processedString looks exactly like fileAsString, so it hasn't accomplished the re.sub task. What am I doing wrong?

EDIT: Ok, it definitely has something to do with the string I'm searching to replace containing regex code. Is there a way to make it just interpret it foundStatements[iterator] as a raw string to search for? The only solution I can think of is to create a function that replaces any regex symbols in a string with \regexsymbol (e.g. * -> \*), but it'd make sense for there to be a way to solve this with inbuilt functions. It'd also be a bit overkill since I'd have to make sure it works with every single regex symbol, of which there are quite a few :/

EDIT2: Well, just changing it to re.sub(re.escape(foundStatements[iterator]), latexPreparedString, fileAsString) seems to work. except when the regex statement doesn't hit anything in the original file. To explain, latexPreparedString is generated by using the regex-part of the foundStatements[iterator]. While it's logical that it shouldn't be able to set latexPreparedString to anything when the regex statement doesn't hit anything, I set latexPreparedString = "" by default, so in that case it should re.sub replace it with a blank string if it doesn't hit anything. Here's how to code looks at the moment: pastebin.com/wUedK3LN

Arithmomaniac
  • 4,604
  • 3
  • 38
  • 58
user2875994
  • 195
  • 4
  • 13

1 Answers1

0

First, for replacing an exact match in a string, you should use [string.replace()][1]:

processedString = fileAsString(foundStatements[iterator], latexPreparedString)

However, this will still fail in your case, because foundStatements[iterator] has a newline character in it. To escape it, you need to use the r prefix when declaring foundStatements[iterator].

If you still want to use re.sub, you have to both prefix the string with r and use re.escape(foundStatements[iterator]) instead of foundStatements[iterator]. You can read more about re.escape here.

Community
  • 1
  • 1
Arithmomaniac
  • 4,604
  • 3
  • 38
  • 58
  • foundStatements is declared through a re.findall, so I'm not sure how to make that a raw string. I've subverted the problem by sidestepping it in my solution, but I'd still like an answer to how to do this. In my current solution I loop through all the hits twice - once to get all the information out of it, a second to replace it. Here's my code so far if that was hard to understand: [link](http://pastebin.com/Q10gqcXj). You can see it commented out where I was trying to insert the solution I'm looking for. – user2875994 Oct 19 '14 at 04:04
  • When strings are created from other functions (like `re.findall`), Any literals are already escaped; you only need 'r' when defining a string explicitly. You do still need to use `re.escape`, though – Arithmomaniac Oct 19 '14 at 20:10