1

I'm reading in information from a .tsv file, getting a string representing a regex in each line. For example, I want to detect "remix" or "re-mix", so I read in '\bre-?mix\b' and have to convert it. I searched a bit and found this question along the same lines, but I've tested the answers and none of it works for me.

When I use re.escape() on the pattern, it ends up like this: '\bre-\?mix\b', and after using re.compile() and doing a re.search() on "remix", it fails. I've tried simply inputting raw_regex.replace('\\b', '\\\\b') into re.compile(), and checking the pattern, it looks as it's supposed to, yet still doesn't catch the simple if compiled_regex.search ("remix") check.

What am I doing wrong here? All I need to do is read in raw text regexes, convert, and compile them. If something needs to be changed on the input end, that can be done as well. Thanks!

Community
  • 1
  • 1
Befall
  • 6,560
  • 9
  • 24
  • 29
  • Don't forget to escape the `-` character too `\-` –  Sep 08 '14 at 20:16
  • `re.escape()` creates a regex that matches _only_ the exact string entered, so if you want to be able to match two things, it's never appropriate to use unmodified. – Charles Duffy Sep 08 '14 at 20:17
  • 2
    Anyhow -- I'd start with this by creating a SSCCE / MCSE. See http://sscce.org/ or http://stackoverflow.com/help/mcve. My expectation, reading this, is that any code I wrote to reproduce your problem probably wouldn't, because it's probably something specific to the input file or such. – Charles Duffy Sep 08 '14 at 20:19
  • @Allendar, pardon? `-` doesn't need to be escaped unless it's inside a character class definition. – Charles Duffy Sep 08 '14 at 20:30
  • "*and have to convert it*" - Tell us more about why you think you have to convert it. What is the precise string in the tsv file? What is the value of the Python string after you've read it in? – Robᵩ Sep 08 '14 at 20:34
  • Sorry, likely not the proper wording. I just meant taking the raw string and changing it into something properly fit for the regex compile function that would achieve the same result as me manually entering re.compile (r'\bre-?mix\b'). – Befall Sep 08 '14 at 20:38
  • Tell us more about why you think the string you read in is unfit for `re.compile()`. Does the string that you read in have the wrong value? Does it generate an error message? – Robᵩ Sep 08 '14 at 20:40
  • Because currently, I have a .tsv line with information and the string read in at row[8]. If I straight away try to do re.compile(row[8]), it gives me an error saying "raise error, v # invalid expression, sre_constants.error: nothing to repeat", whether I use row[8] or str(row[8]). – Befall Sep 08 '14 at 20:42
  • Excellent. Let's solve **that** problem. What do you see if you `print row[8]`? (We appear to have an [XY problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem)). – Robᵩ Sep 08 '14 at 20:43
  • \bre-?mix\b with or without str() conversion. – Befall Sep 08 '14 at 20:45
  • Respectfully, I don't believe you. That message might mean that you have, as the first character of your string, a regex operator that takes a required operand as the first character of your string. Is it possible that your regex begins with any of `+*{?`? – Robᵩ Sep 08 '14 at 20:48
  • It seems that looking into the error revealed that my issue isn't related to my string, but rather the regex not liking some of my input. Via this: http://stackoverflow.com/questions/3675144/regex-error-nothing-to-repeat So, I'm assuming your solution is correct, I'll just need to find a fix for some of my input, like "\bintro(duction)?\b", which is the first to spark that error. EDIT: All of this caused because there was an accidental '\b?' causing everything to fail. Fixed now and it works. Will delete, this was pathetically stupid of me. – Befall Sep 08 '14 at 20:52
  • I don't think that problem can happen with `\bre-?mix\b`. Is that really the regex that you are compiling? – Robᵩ Sep 08 '14 at 20:55
  • I misunderstood an issue and further checking revealed it was a bug. The question has no useful answers that haven't already been answered, and thus it should be closed. Thanks! – Befall Sep 08 '14 at 20:56

1 Answers1

1

This program reads in a string, compiles it as a regex, and tests it against 'remix'. There is no "convert" step required:

#!/usr/bin/python2.7
import csv
import re
with open('x.tsv') as input_file:
  input_file = csv.reader(input_file, delimiter='\t')
  for row in input_file:
    compiled_regex = re.compile(row[0])
    print row[0], bool(compiled_regex.search('remix')), bool(compiled_regex.search('re-mix'))

Input:

remix
re-?mix
\bre-?mix\b
.*
this line should not match

Output:

remix True False
re-?mix True True
\bre-?mix\b True True
.* True True
this line should not match False False
Robᵩ
  • 163,533
  • 20
  • 239
  • 308