0

I would like to compare a string A with a regex R.

A = u'Hi my friend, my name is Julio'
R = r'Hi\s+my\s+friend,\s+my\s+name\s+is([A-Za-z]+)'

At this time I can easily know if the syntax is good thanks to re.matchand re.search. Now I would like to study the differences between A and B when the match doesn't work.

My first case is simple. I replace the regex ([A-Za-z]+) with (.+) to know if the issue is just in the regex group matching. In this case, I can easily raise the issue by saying that the string syntax is good expecting for the group defined for the name.

Now in the case that step 1 and step 2 are failed, I would like to make a diff like HTML diff but with a regex to identify where the regex failed.

I studied difflib and the find_longest_match function but it seems that this function works only character per character and not on a sub string.

Do you have any idea/suggestion to identify the diff based on a regex comparison and potentially compute the ratio measuring the similarity?

Julio
  • 2,493
  • 4
  • 33
  • 53
  • You need an engine that will do parthial matching, or just use cascading optioal constructs. Like: `Hi(\s+(my(\s+(friend(,(\s+(my(\s+(name(\s+(is([A-Za-z]+)?)?)?)?)?)?)?)?)?)?)?)?` –  Oct 07 '14 at 16:31

1 Answers1

0

What you need exactly is not 100% clear from your question, since the answer will depend on the nature of the general case and you've only given one example. However assuming your answer is typical I have a couple suggestions.

your regex is mostly just literal string matching with only a little regex at the end. So it might help if you split up the string match from the regex match. Something like:

into = u'Hi my friend, my name is '
name_r = '([A-Za-z]+)'

if not test_string.startsWith(intro):
    return do_string_diff(test_string)

name = test_string.split(intro)[-1]
if not re.match(name_r, name):
    return do_re_diff(test_string)

return true

You may find something in difflib that does the string comparison you need, or you may have to roll your own. It depends on your specific needs.

You may find something useful over here: https://stackoverflow.com/questions/682367/good-python-modules-for-fuzzy-string-comparison

or do a google search for fuzzy string matching or Levenshtein distance

Community
  • 1
  • 1
jisaacstone
  • 4,234
  • 2
  • 25
  • 39