I have a selenium/python project, which uses a regex match to find html elements. These element attributes sometime includes the danish/norwegian characters ÆØÅ. The problem is in this snippet below:
if (re.match(regexp_expression, compare_string)):
result = True
else :
result = False
Both the regex_expression
and compare_string
are manipulated before the regex match is executed. If i print them before the code snippet above is executed, and also print the result, I get the following output:
Regex_expression: [^log på$]
compare string: [log på]
result = false
I put brackets on to make sure that there were no whitespaces. They are only part of the print statement, and not part of the String variables.
If I however try to reproduce the problem in a seperate script, like this:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import re
regexp_expression = "^log på$"
compare_string = "log på"
if (re.match(regexp_expression, compare_string)):
print("result true")
result = True
else :
print("result = false")
result = False
Then the result is true.
How can this be? To make it even stranger, it worked earlier, and I am not sure what I edited, that made it go boom...
Full module of the regex compare method is here below. I have not coded this myself, so I am not a 100% familiar with the reason of all the replace statements, and String manipulation, but I would think it shouldn't matter, when I can check the Strings just before the failing match method in the bottom...
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import re
def regexp_compare(regexp_expression, compare_string):
#final int DOTALL
#try: // include try catch for "PatternSyntaxException" while testing/including a new symbol in this method..
#catch(PatternSyntaxException e):
# System.out.println("Regexp>>"+regexp_expression)
# e.printStackTrace()
#*/
if(not compare_string.strip() and (not regexp_expression.strip() or regexp_expression.strip().lower() == "*".lower()) or (regexp_expression.strip().lower() == ".*".lower())):
print("return 1")
return True
if(not compare_string or not regexp_expression):
print("return 2")
return False
regexp_expression = regexp_expression.lower()
compare_string = compare_string.lower()
if(not regexp_expression.strip()):
regexp_expression = ""
if(not compare_string.strip() and (not regexp_expression.strip() or regexp_expression.strip().lower() == "*".lower()) or (regexp_expression.strip().lower() == ".*".lower())):
regexp_expression = ""
else:
regexp_expression = regexp_expression.replace("\\","\\\\")
regexp_expression = regexp_expression.replace("\\.","\\\\.")
regexp_expression = regexp_expression.replace("\\*", ".*")
regexp_expression = regexp_expression.replace("\\(", "\\\\(")
regexp_expression = regexp_expression.replace("\\)", "\\\\)")
regexp_expression_arr = regexp_expression.split("|")
regexp_expression = ""
for i in range(0, len(regexp_expression_arr)):
if(not(regexp_expression_arr[i].startswith("^"))):
regexp_expression_arr[i] = "^"+regexp_expression_arr[i]
if(not(regexp_expression_arr[i].endswith("$"))):
regexp_expression_arr[i] = regexp_expression_arr[i]+"$"
regexp_expression = regexp_expression_arr[i] if regexp_expression == "" else regexp_expression+"|"+regexp_expression_arr[i]
result = None
print("Regex_expression: [" + regexp_expression+"]")
print("compare string: [" + compare_string+"]")
if (re.match(regexp_expression, compare_string)):
print("result true")
result = True
else :
print("result = false")
result = False
print("return result")
return result