-1

I'm trying to write a script to update a text file by replacing instances of certain characters, (i.e. 'a', 'w') with a word (i.e. 'airplane', 'worm').

If a single line of the text was something like this:

a.function(); a.CallMethod(w); E.aa(w);

I'd want it to become this:

airplane.function(); airplane.CallMethod(worm); E.aa(worm);

The difference is subtle but important, I'm only changing 'a' and 'w' where it's used as a variable, not just another character in some other word. And there's many lines like this in the file. Here's what I've done so far:

original = open('original.js', 'r')
modified = open('modified.js', 'w')
# iterate through each line of the file
for line in original:
    # Search for the character 'a' when not part of a word of some sort
    line = re.sub(r'\W(a)\W', 'airplane', line)
    modified.write(line)

original.close()
modified.close()

I think my RE pattern is wrong, and I think i'm using the re.sub() method incorrectly as well. Any help would be greatly appreciated.

Asif
  • 748
  • 3
  • 9
  • 32
  • You should enclose the result you're getting so we could see what is it that you're not happy with. – OzW Aug 25 '15 at 07:53

2 Answers2

2

If you're concerned about the semantic meaning of the text you're changing with a regular expression, then you'd likely be better served by parsing it instead. Luckily python has two good modules to help you with parsing Python. Look at the Abstract Syntax Tree and the Parser modules. There's probably others for JavaScript if that's what you're doing; like slimit.

Future reference on Regular Expression questions, there's a lot of helpful information here:

And it took me 30 minutes from never having used this JavaScript parser in Python (replete with installation issues: please note the right ply version) to writing a basic solution given your example. You can too.

# Note: sudo pip3 install ply==3.4 && sudo pip3 install slimit

from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor

data = 'a.funktion(); a.CallMethod(w); E.aa(w);'

tree = Parser().parse(data)
for node in nodevisitor.visit(tree):
if isinstance(node, ast.Identifier):
    if node.value == 'a':
        node.value = 'airplaine'
    elif node.value == 'w':
        node.value = 'worm'

print(tree.to_ecma())

It runs to give this output:

$ python3 src/python_renames_js_test.py
airplaine.funktion();
airplaine.CallMethod(worm);
E.aa(worm);

Caveats:

  1. function is a reserved word, I used funktion
  2. the to_ecma method pretty prints; there is likely another way to output it closer to the original input.
Community
  • 1
  • 1
dlamblin
  • 43,965
  • 20
  • 101
  • 140
  • I don't think I need a parser because I'm pretty sure if I can successfully match 'a' surrounded by anything except for another letter, i'll be fine. I'm just struggling with the correct RE pattern. – Asif Aug 25 '15 at 00:31
  • @Asif Okay then, you've got this; no problem. Sorry I tried to help. How do you know if the line isn't actually inside quotes that start and end on other lines? – dlamblin Aug 25 '15 at 00:36
1
line = re.sub(r'\ba\b', 'airplane', line)

should get you closer. However, note that you will also be replacing a.CallMethod("That is a house") into airplane("That is airplane house"), and open("file.txt", "a") into open("file.txt", "airplane"). Getting it right in a complex syntax environment using RegExp is hard-to-impossible.

Amadan
  • 191,408
  • 23
  • 240
  • 301
  • Does hard-to-impossible mean impossible? – Asif Aug 25 '15 at 00:41
  • Depends on your text, and on your RegExp dialect. E.g. Ruby can write whole parsers in RegExp; Python's is quite a bit less powerful. But either way it is very easy to get airplane case that you just didn't expect. – Amadan Aug 25 '15 at 00:44