I'm very new to programming and I would really appreciate any help! I am trying to write this little python script:
I have an .html
file of a legal codification in §§. (For example: http://www.gesetze-im-internet.de/stgb/BJNR001270871.html) Now I want to write a python
regex script to automatically tag specific §§. The relevant html code of the document is:
"<div class="jnnorm" id="BJNR398310001BJNE000100305" title="Einzelnorm"><div
class="jnheader"> <a name="BJNR398310001BJNE000100305"/><a
href="index.html#BJNR398310001BJNE000100305">Nichtamtliches Inhaltsverzeichnis</a>h3><span
class="jnenbez">§ 1</span> <span class="jnentitel"></span></h3> </div>"
Here "div class="jnnorm"
should become "div class="jnnorm MYTAGHERE"
. The last element in "class="jnenbez">§ 1"
contains the number of the §, here § 1.
I am trying (and failing) to write a script that does the following:
1) Lets say I have a dictionary my_dict = [112, 204]
2) Find "<span class="jnenbez">§ 112"
and "<span class="jnenbez">§ 204"
in the .htm
file
3) Go left from "jnenbez">§ 112"
to the next "jnnorm"
string and replace it with
"jnnorm MYTAGHERE"
.
Here is what I got so far, but I hit a roadblock quite soon.
f = file("filename.htm","r")
text = f.read()
import re
my_dict=[1,123,200]
# dont know how to find the §
re.sub("jnnorm", "jnnorm MYTAGHERE", text)
#re.sub does not seem to work?