1

What I have looked at already: how to use a variable inside a regular expression

Here is the code I have:

import re

#take user input as an argument
print('Enter 1st Argument: value to strip.')

user_input = input()

#take value to strip off as another argument
print('Enter 2nd Argument: The value to strip off the 1st value.')

strip_value = input()

#Recreate Strip Function
def regex_strip(value,what_to_strip):

     thing2 = 'L'
     what_to_strip = re.compile(r + re.escape(thing2))
     print(what_to_strip)
    #fv = what_to_strip.search('tigers named L')
    #print(fv.group())

regex_strip(user_input, strip_value)

I am expecting the user to submit two values. The first value is the value that will be subject to the stripping. The 2nd value is what is being stripped.

In my function, I am hard-coding values in order to test my regular expression.

Error message I am getting:

name 'r' is not defined

what am I doing wrong?

Edit #1: This is what I have tried:

thing2 = '\d'
what_to_strip = re.compile(re.escape(thing2))
print(what_to_strip)
fv = what_to_strip.search('123')
print(fv.group())

Result:

'NoneType' object has no attribute 'group'

My thoughts: Something is wrong with thing2 = '\d' I want just '\d' but I am getting '\\\\d' hmm.

  • 2
    `re.compile(r + re.escape(thing2))` did you expect that `r` to be a variable? Or perhaps you wanted to make it a "raw" string for python? The r doesn't work that way, but you don't need it because input is not subject python's internal escaping rules (that applies to stings compiled in a python script). – tdelaney Apr 12 '18 at 04:02
  • I wanted to make it a raw string. – Chicken Sandwich No Pickles Apr 12 '18 at 04:03
  • @tdelaney are you telling me that if the user gave the string that said "*", without the quotation marks, I would not have to do any escaping? – Chicken Sandwich No Pickles Apr 12 '18 at 04:05
  • 1
    If the user's input is your entire regular expression, you don't ned a regular expression at all—`re.compile(re.escape(needle)).search(haystack)` is really just an overcomplicated (and slow) way to write `haystack.find(needle)`. The only reason you need to do this kind of thing is if you want to build a more complex search involving a user string. For example, it might make sense to search for `'spam and ({}|{})'.format(re.escape(thing1), re.escape(thing2))`, but not just for `re.escape(thing1)`. – abarnert Apr 12 '18 at 04:05
  • @abarnert I am reading a book about python for self-learning and the assignment was to take user input and replicate the strip method :) – Chicken Sandwich No Pickles Apr 12 '18 at 04:06
  • 1
    A "raw string" isn't a kind of string. It's a kind of _string literal_ —a way of entering a string into your Python source code without normal Python source code rules being applied. You don't "make a string into a raw string". You _do_ escape strings, using `re.escape`, but that's a separate thing. – abarnert Apr 12 '18 at 04:06
  • 1
    Does the assignment tell you to do it using regular expressions? If so, it's a silly assignment that's not doing a great job teaching you what regular expressions are for, but that's not your fault (you didn't write the book), and I guess you might still learn something from it. – abarnert Apr 12 '18 at 04:09
  • @abarnert This is the assignment from the book: "Write a function that takes a string and does the same thing as the strip() string method. If no other arguments are passed other than the string to strip, then whitespace characters will be removed from the beginning and end of the string. Otherwise, the characters specified in the second argument to the function will be removed from the string." – Chicken Sandwich No Pickles Apr 12 '18 at 04:10
  • @LunchBox I think they expected you to do it by either looping over all the characters with `for ch in s:`, or using other string methods like `find` and `rfind` (or just `startswith` and `endswith`). There's obviously nothing at all wrong with thinking outside the box to come up with a different solution, but I don't think using `re` is going to help you here. (Still, as I said, you may learn something from trying to get it to work anyway.) – abarnert Apr 12 '18 at 04:15
  • I think if I can figure this out, it will be an invaluable learning experience. I am going to edit my original post with things that I have tried and my results for now. – Chicken Sandwich No Pickles Apr 12 '18 at 04:19

2 Answers2

1

You can skip the escape function:

what_to_strip = re.compile(thing2)

:)

crestniraz
  • 66
  • 6
  • thing2 = '\d' what_to_strip = re.compile(re.escape(thing2)) print(what_to_strip) fv = what_to_strip.search('123') print(fv.group()) – Chicken Sandwich No Pickles Apr 12 '18 at 04:14
  • 1
    @LunchBox When you call `re.escape` on `'\d'`, it turns that into `'\\d'`—a search for a literal backslash followed by a `d`, instead of a search for a digit. That's the whole point of `re.escape`. If you want users to give you regex patterns instead of strings, and to interpret them as regex patterns, just drop the `re.escape` entirely. (Although then you aren't actually implementing the same functionality as `strip`, which is what the exercise asked for.) – abarnert Apr 12 '18 at 04:17
  • @abarnert Right now I am not worried about stripping values. Right now I am trying to get it to recognize my regex at all. Once that works, I'll worry about the strip part. – Chicken Sandwich No Pickles Apr 12 '18 at 04:20
  • what a minute, I just had a thought after reading your post for the 2nd time . . . – Chicken Sandwich No Pickles Apr 12 '18 at 04:21
  • The users aren't going to give me regex to strip off, they will give actual values . . . so I don't need escape at all. – Chicken Sandwich No Pickles Apr 12 '18 at 04:21
  • @LunchBox But it _is_ recognizing your regex. It's just recognizing it as `'\\d'`, which doesn't match anything in `'123'`. And it's only doing that because you asked it to by calling `escape`. – abarnert Apr 12 '18 at 04:22
  • 1
    @LunchBox No, you've got that backward. You _don't_ want escape if you want to interpret the input as a regex. You _do_ want escape if you want to interpret it as a plain string (even if it happens to look like a regex). For example, if the user wants to strip off `'.'`, you want to escape that, so it only strips literal periods, not every character. – abarnert Apr 12 '18 at 04:23
  • @abarnert I understand now. I only want to use escape if I am using special characters aka stuff like '.' or '*'. Otherwise, the re.escape() part is not needed. – Chicken Sandwich No Pickles Apr 12 '18 at 04:27
  • let's see ...brb doing some experimenting. – Chicken Sandwich No Pickles Apr 12 '18 at 04:28
  • 1
    @LunchBox Exactly! And with user input, you never know _what_ you're going to get. – abarnert Apr 12 '18 at 04:28
  • Although crestniraz gave me the solution, @abarnert helped me really understand; however, I can't mark a comment as the answer. abarnert, if you post something as an answer and not as a comment, I'll mark it as the answer to my question. Everything works now :) – Chicken Sandwich No Pickles Apr 12 '18 at 04:33
1

The first problem is that you're confusing raw string literals with strings. A string literal is the way you enter a string in your Python source code, like "abc". You can use an r prefix to make this a raw string literal, like r"a\b\c". That doesn't change what kind of string it is, it just prevents the usual Python source code rules from being applied, so you get actual backslashes and letters instead of special characters like a backspace. So, you can't turn user input into a raw string, but you don't have to—the string is already exactly the letters the user typed.

(This can be a bit confusing, because when you print out a regular expression, you see something like re.compile(r'\.', re.UNICODE). That r isn't really part of the object; it's showing you how you could create exactly the same regular expression object in your source code.)


The re.escape function is sort of similar, but it's not the same thing. What it does is take a regex pattern and turn it into another pattern with all the regex special characters escaped. So, for example, re.escape('.') gives you \., meaning it will only match an actual . character, rather than matching anything. Since user input can easily contain characters like ., and the user probably isn't asking you to strip every character, you were right to use re.escape here.

So:

re.compile(re.escape(thing2))

When you tested this code with the input \d and tried to search the string 123, it didn't find anything. But that's exactly what you want. If the user types in \d, they're not asking to strip off any digit, they're asking to strip off \ and d.

Of course for some programs, you really do want to take regular expressions from the user. (For example, you might want to write something similar to grep.) In that case, you wouldn't call re.escape.


One last thing: When you call '1234'.strip('14'), that doesn't strip off the string '14' from both sides, it strips off any characters that are in the string '14'—in order words, you'll get back 23. To make this work with a regular expression, you want to turn that '14' into '1|4'. In other words, you want to escape each character, and then join those characters up with '|', to get the pattern.

abarnert
  • 354,177
  • 51
  • 601
  • 671