1

I am looking to remove the last statement in a rule used for parsing. The statements are encapsulated with @ characters, and the rule itself is encapsulated with pattern tags.

What I want to do is just remove the last rule statement.

My current idea to achieve this goes like this:

  1. Opens the rules file, saves each line as an element into a list.
  2. Selects the line that contains the correct rule-id and then saves the rule pattern as a new string.
  3. Reverses the saved rule pattern.
  4. Removes the last rule statement.
  5. Re-reverses the rule pattern.
  6. Adds in the trailing pattern tag.

So the input will look like:

<pattern>@this is a statement@ @this is also a statement@</pattern>

Output will look like:

<pattern>@this is a statement@ </pattern>

My current attempt goes like this:

with open(rules) as f:
    lines = f.readlines()
string = ""
for line in lines:
    if ruleid in line:
        position = lines.index(line)
        string = lines[position + 2] # the rule pattern will be two lines down
                                     # from where the rule-id is located, hence 
                                     # the position + 2

def reversed_string(a_string): #reverses the string
    return a_string[::-1] 
def remove_at(x): #removes everything until the @ character
    return re.sub('^.*?@','',x) 

print(reversed_string(remove_at(remove_at(reversed_string(string)))))

This will reverse the string but not remove the last rule statement once it has been reversed.

Running just the reversed_string() function will successfully reverse the string, but trying to run that same string through the remove_at() function will not work at all.

But, if you manually create the input string (to the same rule pattern), and forgo opening and grabbing the rule pattern, it will successfully remove the trailing rule statement.

The successful code looks like this:

string = '<pattern>@this is a statement@ @this is also a statement@</pattern>'

def reversed_string(a_string): #reverses the string
    return a_string[::-1] 
def remove_at(x): #removes everything until the @ character
    return re.sub('^.*?@','',x) 

print(reversed_string(remove_at(remove_at(reversed_string(string)))))

As well, how would I add in the pattern tag after the removal is complete?

martineau
  • 119,623
  • 25
  • 170
  • 301

1 Answers1

1

The lines you are reading probably have a \n at the end and that's why your replacement is not working. This question can guide you about reading the file without new lines.

Among the options, one could be removing the \n using rstrip() like this:

string = lines[position + 2].rstrip("\n")

Now, about the replacement, I think you could simplify it by using this regular expression:

@[^@]+@(?!.*@)

It consists of the following parts:

  • @[^@]+@ matches one @ followed by one or more characters that are not an @ and then another @.
  • (?!.*@) is a negative lookahead to check that no @ is found ahead, preceded by zero or more occurrences of any other character.

Here you can see a demo of this regular expression.

This expression should match the last statement and you would not need to reverse the string:

re.sub("@[^@]+@(?!.*@)", "", string)
Hernán Alarcón
  • 3,494
  • 14
  • 16