Is it possible to do multiple complex regex substitutions, where the substitution is a function, in a single line in python?

Question

I'm writing a script in python that will replace email address and IP address with random characters in a log file.

Right now one line is read, the substitution for email address is done, and then the same line is read again to do the substitution for the IP address.

I want to do both in one go, but cannot find a way to do so.

I read the post here, but that seems to work only for simple substitutions. In this case, I'm substituting a (relatively) complex regex with a function, and can't figure out a way to do it in one line.

This is the original code:

import re
import hashlib


def hashing_func(to_hash):
    ''' Calculates hash '''
    return hashlib.sha256(to_hash).hexdigest()


def hashing_func_email(username, domain):
    ''' Creates a separate hash for username and
        domain. Appends 'EM_' to show it's an email.
        Reduces hash length to make it more readable.'''
    username_hash = hashing_func(username)
    domain_hash = hashing_func(domain)
    return 'EM_' + username_hash[:13] + '@' + domain_hash[:10]


def hashing_func_ipaddr(ipaddr):
    ''' Creates a hash for IP address, and appends 'IP_'.
        Reduces length to make it more readable.'''
    ipaddr_hash = hashing_func(ipaddr)
    return 'IP_' + ipaddr_hash[:11]


def main():
    email_regex = re.compile(r'''(
                                 [a-zA-Z0-9._+-]+)
                                 (@|%40)
                                 ([a-zA-Z0-9.-]+
                                 (\.[a-zA-Z0-9]{2,4})
                                 )''', re.VERBOSE)

    ipaddr_regex = re.compile(r'''\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
                              (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
                              ''', re.VERBOSE)

    old_file = ["10.38.1.2, user@example.com, was here",
                "This user2@example.com tried from 84.12.41.53 again"]

    for line in old_file:
        new_line = re.sub(
            email_regex, lambda x: hashing_func_email(
                                x.group(1), x.group(3)), line)
        new_line_ip = re.sub(
            ipaddr_regex, lambda x: hashing_func_ipaddr(
                                         x.group(0)), new_line)

        print new_line_ip

if __name__ == '__main__':
    main()

Please see the answer below, if it helps you can mark it as answered. — bhansa, Sep 09 '17 at 20:05

Colonder · Accepted Answer · 2017-09-13T08:39:51.563

1

You can do something like this, I believe.

for line in old_file:
        new_line = re.sub(email_regex, \
        lambda x: hashing_func_email(x.group(1), x.group(3)), \
        re.sub(ipaddr_regex, lambda x: hashing_func_ipaddr(x.group(0)), line))

        print new_line

EDIT
Don't reinvent the wheel, check these out:
How can I do multiple substitutions using regex in python?
Python Regex sub() with multiple patterns
https://www.safaribooksonline.com/library/view/python-cookbook-2nd/0596007973/ch01s19.html

edited Sep 13 '17 at 08:39

answered Sep 09 '17 at 19:58

Colonder

1,556
3
20
40

Identation looks awkward though in stack code snippet. – bhansa Sep 09 '17 at 20:00
I guess it's just my laziness. I will try to beautify it. – Colonder Sep 09 '17 at 20:01
Please, don't forget to accept the answer when you'll be able to :) – Colonder Sep 09 '17 at 20:04
Wow thank you so much! This is perfect! If I understand correctly, re.sub(ipaddr_regex) is an input to the re.sub(email_regex), which then applies both to the 'line'. Is that correct? I've also got a couple of other regexes I want to run along with these two, and simply tacking them on just works. – ShanxT Sep 09 '17 at 20:22
I had seen those links before posting. The problem is that they don't seem to accept complex regexes. So direct word substitution works, but if I use something like [abcd], it'll either ignore it or will give an error. For example, in the safaribooksonline link, I added this to the dictionary: `"[is]" : "x"` The expected result would be "Guido van Rossum xx the..", but it just ignored it. – ShanxT Sep 10 '17 at 01:35

Is it possible to do multiple complex regex substitutions, where the substitution is a function, in a single line in python?

1 Answers1