How do I use regular expressions in Python with placeholder text?

Question

I am doing a project in Python where I require a user to input text. If the text matches a format supported by the program, it will output a response that includes a user's key word (it is a simple chat bot). The format is stored in a text file as a user input format and an answer format.

For example, the text file looks like this, with user input on the left and output on the right:

my name is <-name> | Hi there, <-name>

So if the user writes my name is johnny, I want the program to know that johnny is the <-name> variable, and then to print the response Hi there, johnny.

Some prodding me in the right direction would be great! I have never used regular expressions before and I read an article on how to use them, but unfortunately it didn't really help me since it mainly went over how to match specific words.

Not totally sure, what you're asking here. REGEXP is about machting specific string structures (say i.e. words), what else? Any kind of semantic analysis is a totally different thing. — hasienda, Feb 04 '12 at 13:22

Rob Wouters · Accepted Answer · 2012-02-04T14:37:47.443

9

Here's an example:

import re

io = [
    ('my name is (?P<name>\w+)', 'Hi there, {name}'),
]

string = input('> ')
for regex, output in io:
    match = re.match(regex, string)
    if match:
        print(output.format(**match.groupdict()))
        break

I'll take you through it:

'my name is (?P<name>\w+)'

(?P<name>...) stores the following part (\w+) under the name name in the match object which we're going to use later on.

match = re.match(regex, string)

This looks for the regex in the input given. Note that re.match only matches at the beginning of the input, if you don't want that restriction use re.search here instead.

If it matches:

output.format(**match.groupdict())

match.groupdict returns a dictionary of keys defined by (?P<name>...) and their associated matched values. ** passes those key/values to .format, in this case Python will translate it to output.format(name='matchedname').

To construct the io dictionary from a file do something like this:

io = []
with open('input.txt') as file_:
    for line in file:
        key, value = line.rsplit(' | ', 1)
        io.append(tuple(key, value))

edited Feb 04 '12 at 14:37

answered Feb 04 '12 at 13:22

Rob Wouters

15,797
3
42
36

+1, I like this. It's clean and elegant. Although it doesn't really address the file input thing. – Niklas B. Feb 04 '12 at 13:24
I am sure that my re.sub version, and this (as I didn't address the multiple subs thing) could be combined to simplify it. The string format and match group could become a sub - although I also see that the re.sub version would not indicate that a sub had actually been made. – Danny Staple Feb 04 '12 at 13:29
@NiklasBaumstark, true. Didn't consider that as relevant part of the question. It's quite straightforward though, so I'll add it to my answer for the sake of completeness. – Rob Wouters Feb 04 '12 at 13:30
3

@Danny: Sorry, I don't think misusing `re.sub` like this is necessary or advisable here. It's pattern matching, then string creating, not text replacement. For me, this should be the accepted answer. – Niklas B. Feb 04 '12 at 13:31
@DannyStaple, I don't think using `re.sub` here will simplify things. Using named matches and `format` gives you a ton of flexibility. – Rob Wouters Feb 04 '12 at 13:33
I don't understand why `io` is a dictionary. It's not as if you really look up anything using the key. I'd say it's really a tuple. – hughdbrown Feb 04 '12 at 14:30
@hughdbrown, Yeah you're absolutely right. That's not very pythonic of me, fixing it right away. Thanks for pointing it out. – Rob Wouters Feb 04 '12 at 14:37
Thank you everyone for all your help! I have fiddled around with this code, and what I have found is that it does not work when an 'else' case is added (it just executes the else case when it normally would work) and it does not work for cases where replacement text is required (only works for a response where user text exactly matches the user text in the text file). I printed the io dictionary out, and I realised it looks the same as the text file, so I think some replacement of the etc needs to be done to make them regular expressions. Maybe..? – user1189336 Feb 04 '12 at 14:50
@user1189336, I have no idea what you're trying to do with your `else` clause. If you want to reply to a user saying "foo" with "bar" just add `foo | bar` to your file. – Rob Wouters Feb 04 '12 at 15:08
@RobWouters with the 'else' clause, I just put a phrase in for the chat bot to default to if the users text doesn't match any of the rules. Eg if the user types "dfhafjdks" it will have a default message because this text doesn't match anything in the dictionary. I could put this in the text file though. – user1189336 Feb 04 '12 at 15:45
Also, it works for "I feel happy blah" but not for just "I feel happy". Not sure why this is, I tried replacing the '+' with a '*' and this didn't fix it. OH I think I know what it is, it's the ending /w...I'll try and change it to /^w. edit: this didn't work either. Consulting the regex dictionary. – user1189336 Feb 04 '12 at 15:55
@user1189336, you need to put the `else` statement on the same indentation as the for loop, not as the `if`. Then it will work. Regarding your second statement you need to make the space after "happy" optional too. Try typing `"I feel happy "` (trailing space). – Rob Wouters Feb 04 '12 at 16:04
@RobWouters I fixed the issue with the second statement with a workaround (just adding an extra word to the user's raw input) and the program works fine. Thanks for the pointer on the else statement :) – user1189336 Feb 04 '12 at 16:31

score 6 · Answer 2 · answered Feb 04 '12 at 13:22

You are going to want to do a group match and then pull out the search groups.

First you would want to import re - re is the python regex module. Lets say that user_input is the var holding the input string. You then want to use the re.sub method to match your string and return a substitute it for something.

output = re.sub(input_regex, output_regex, user_input)

So the regex, first you can put the absolute stuff you want:

input_regex = 'my name is '

If you want it to match explicitly from the start of the line, you should proceed it with the caret:

input_regex = '^my name is '

You then want a group to match any string .+ (. is anything, + is 1 or more of the preceding item) until the end of the line '$'.

input_regex = '^my name is .+$'

Now you'll want to put that into a named group. Named groups take the form "(?Pregex)" - note that those angle brackets are literal.

input_regex = '^my name is (?P<name>.+)$'

You now have a regex that will match and give a match group named "name" with the users name in it. The output string will need to reference the match group with "\g"

output_regex = 'Hi there, \g<name>'

Putting it all together you can do it in a one liner (and the import):

import re
output = re.sub('^my name is (?P<name>.+)$', 'Hi there, \g<name>', user_input)

hasienda · Answer 3 · 2012-02-04T13:53:22.937

Asking for REGEXP inevitably leads to answers like the ones you're getting right now: demonstrations of basic REGEXP operations: how to split sentences, search for some term combination like 'my' + 'name' + 'is' within, etc.

In fact, you could learn all this from reading existing documentation and open source programs. REGEXP is not exactly easy. Still you'll need to understand a bit on your own to be able to really know what's going on, if you want to change and extend your program. Don't just copy from the receipts here.

But you may even want to have something more comprehensive. Because you mentioned building a "chat bot", you may want see, how others are approaching that task - way beyond REGEXP. See:

So if the user writes 'my name is johnny', I want the program to know that 'johnny' is the '<-name>' variable, ...

From you question it's unclear, how complex this program should become. What, if he types

'Johnny is my name.'

or

'Hey, my name is John X., but call me johnny.'

?

score 0 · Answer 4 · answered Feb 04 '12 at 13:20

Take a look at re module and pay attention for capturing groups.

For example, you can assume that name will be a word, so it matches \w+. Then you have to construct a regular expression with \w+ capturing group where the name should be (capturing groups are delimited by parentheses):

r'my name is (\w+)'

and then match it against the input (hint: look for match in the re module docs).

Once you get the match, you have to get the contents of capturing group (in this case at index 1, index 0 is reserved for the whole match) and use it to construct your response.

How do I use regular expressions in Python with placeholder text?

4 Answers4

Linked