2

I'm pretty new to Python, but we are working on cleaning up some text files, and, among others, I will need to do the following: Replace spaces with underscores, but only in some cases. The cases are such that the beginning is marked with /2, and the end is marked with /1.

E.g.:

Here is some text, /2This is an example/1 only.

I would like to turn this into:

Here is some text, This_is_an_example only.

I know how to do a universal replace (either just with python or with regex), and also know how to do a regex search that would match all the /2...../1 expressions. But cannot figure out how to combine those: to replace ONLY when the expression is found, and leave the rest of the text alone. I would be very grateful for any suggestions!


People keep asking for a code I have and/or point me to basic python documentations. It is a relatively long program since we have to do a lot of things with our input, and this is just one of them. It would be part of a series of find and replace steps; here are some others:

for x in handle:
    for r in (("^009", ""),("/c", ""),("#", ""),("\@", "")):
        x = x.replace(*r)
        # get rid of all remaining latex commands
    x = re.sub("\\\\[a-z]+", "", x)
    x = re.sub("\.h/.*?//", "", x)
    # get rid of punctuation
    x = re.sub('\.', '', x)
    x = re.sub('\,', '', x)
    x = re.sub('\;', '', x)
    x = re.sub('\n', ' \n', x)
    x = re.sub('\|.*?\|', '', x)
    x = re.sub('\'', '', x)
    x = re.sub('\"', '', x)
    # Here's an initial attempt
    y = re.findall('\/2.*?\/1', x)
    for item in y:
        title = re.sub('\s', '_', item)
#but the question is how do I place these things back into x?
    s.write(x)
s.close()
handle.close()

Edit 2: Here is a(nother) thing that does NOT work:

for item in re.findall('\/2.*?\/1', x):
        item = re.sub('\s', '_', item)
ZVT
  • 33
  • 5
  • 1
    maybe this [link](https://stackoverflow.com/questions/16159969/replace-all-text-between-2-strings-python) can help you – alim91 Jul 06 '20 at 18:01
  • this[1] will show you how to replace text using python's `re.sub()` function (of the `re` module) [https://docs.python.org/3/howto/regex.html#search-and-replace] – theX Jul 06 '20 at 18:03
  • It seems you know everything you need. If you cannot make it work, then please add code to your question so we can see where you may have gone wrong. – trincot Jul 06 '20 at 18:06
  • I don't have a code yet -- or only have one for the million other things we need to do with the text. The above links don't help too much, because they do not show how to combine the 2 things I need to combine. – ZVT Jul 06 '20 at 18:09

1 Answers1

3

Use re.sub with a lambda:

x = re.sub(r'/2.*?/1', lambda x: re.sub(r'\s+', '_', x.group()), x)

Match all strings between /2 and /1 and replace whitespace strings only there with the nested re.sub.

Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37
  • I've always wondered how the lambda parts work. Can you explain the lambda? – theX Jul 06 '20 at 18:59
  • Thank you so much. This is indeed it. I need to learn how to use this lambda operator thing. Many thanks! – ZVT Jul 06 '20 at 18:59
  • Well, I'm very obviously not the expert here, but it seems to work like this: you specify the condition in the first bit, and then the lambda says that for whatever that meets that first condition, do the thing that you specify next. This is exactly what I needed, I just did not know how the lambda worked either... :/ – ZVT Jul 06 '20 at 19:01
  • @theX [Lambda](https://www.w3schools.com/python/python_lambda.asp) is a kind of a callable, an anonymous function, and here, `x` passed to the lambda function as an argument is a match data object that can be modified to meet our requirements. – Ryszard Czech Jul 06 '20 at 19:02
  • @RyszardCzech I know lambda, but I just don't get how it gets its arguments from the re.sub and the x.group() part – theX Jul 06 '20 at 19:15