how can i achieve below in python

Question

I have a text file that looks similar to the below example:

# Build 74
* [52e3e](https://gitlab.com/jenkins-code/nancy-jenkins-helm/commit/52e3e) [abc] **Testing Change 1*
* [52e3e](https://gitlab.com/jenkins-code/nancy-jenkins-helm/commit/52e3e) [abc] **Testing Change 2*

I want to change these lines as shown in the following example using python:

# Build 74
<p>* <a href=\"https://gitlab.com/jenkins-code/nancy-jenkins-helm/commit/52e3e\">52e3e</a> [abc] **Testing Change 1* </p>
<p>* <a href=\"https://gitlab.com/jenkins-code/nancy-jenkins-helm/commit/52e3e\">52e3e</a> [abc] **Testing Change 2* </p>

There may be likely a regex way to do a replace on this, but you need to provide a bit more info how similar the lines are? — chromebookdev, Sep 23 '22 at 11:17
I will use Regex. https://docs.python.org/3/howto/regex.html — Xbel, Sep 23 '22 at 11:18

score 1 · Answer 1 · answered Sep 23 '22 at 11:25

Regex is your friend in cases like this. You can use match groups (denoted by unescaped parentheneses) to extract specific info and then you can fit that info into differently formatted f-string. Should any other parts of the string be changing you can use the same tools to fit this solution to your needs.

Step by step analysis for this regex - https://regex101.com/r/AVileR/1

import re

text = r"* [52e3e](https://gitlab.com/jenkins-code/nancy-jenkins-helm/commit/52e3e) [abc] **Testing Change 1*"

pattern = r"\* \[(\w+)\]\(([\w:\/\.\-\%\&\+]+)\) \[(\w+)\] \*\*([\w\s]+)\*"

match = re.match(pattern=pattern, string=text)

out = fr"<p>* <a href=\"{match[2]}\">{match[1]}</a> [{match[3]}] **{match[4]}* </p>"
expected = r"<p>* <a href=\"https://gitlab.com/jenkins-code/nancy-jenkins-helm/commit/52e3e\">52e3e</a> [abc] **Testing Change 1* </p>"
print(out)

print(out == expected)

score 0 · Answer 2 · answered Sep 23 '22 at 11:31

You can create a pattern that returns the 4 groups that have to be placed separately in the end result, then just use f strings to format the end result:

import re


pattern = re.compile(r"(.*?)\[(.*?)]\((.*?)\)(.*)")

with open("new.txt") as file:
    for line in file:
        match = pattern.match(line)
        if match is not None:
            start, text, url, end = match.groups()
            changed = f"<p>{start}<a href=\"{url}\">{text}</a>{end} </p>"
            print(changed)

score 0 · Answer 3 · answered Sep 23 '22 at 11:33

You should start by building a regex statement. I use https://regex101.com/ to do some of my testing

Check out this resource for matching groups: https://regexone.com/lesson/capturing_groups

Have a read through this on the python usage to replace matched groups https://docs.python.org/3/library/re.html#re.sub

You'll replace groups of sections: eg: ** [52e3e](* gets replaced with

<a href="*

I'm guessing the alphanumerics in the square brackets are different, so you can use the \w and \d (word character and digit) options to save a-zA-Z0-9.

And finally there's some gotchas to be aware of here when replacing groups: Why does re.sub replace the entire pattern, not just a capturing group within it?

how can i achieve below in python

3 Answers3