I'm trying to automate a find+replace for a series of broken image links in .rst files. I have a csv file where column A is the "old" link (which is seen in the .rst files) and column B is the new replacement link for each row.
I can't use pandoc first to convert to HTML because it "breaks" the rst file. I did this once for a set of HTML files using BeautifulSoup and regex,but that parser wont work for my rst files.
A coworker suggested trying Grep, but I can't seem to figure out how to call in the csv file to make the "match" and switch.
for the html files, it would loop through each file, search for an img tag and replace links using the csv file as a dict
with open(image_csv, newline='') as f:
reader = csv.reader(f)
next(reader, None) # Ignore the header row
for row in reader:
graph_main_nodes.append(row[0])
graph_child_nodes.append(row[1:])
graph = dict(zip(graph_main_nodes, graph_child_nodes)) # Dict with keys in correct location, vals in old locations
graph = dict((v, k) for k in graph for v in graph[k])
for fixfile in html:
try:
with open(fixfile, 'r', encoding='utf-8') as f:
soup = BeautifulSoup(f, 'html.parser')
tags = soup.findAll('img')
for tag in tags:
print(tag['src'])
if tag['src'] in graph.keys():
tag['src'] = tag['src'].replace(tag['src'], graph[tag['src']])
replaced_links += 1
print("Match found!")
else:
orphan_links.append(tag["src"])
print("Ignore")
I would love some suggestions on how to approach this. I'd love to repurpose my BeautifulSoup code but I'm not sure if that's realistic.