1

I run from time to time a linkchecker over my site and the external links 404 will be saved to a logfile.

Now I try to delete the links automated from the markdown files. I use multilingual websites so I start read in the logfile to an array.

IFS=$'\n'
link=( $(awk '{print $7}' $ext) )

for i in "${link[@]}"; do
    grep -r $i content/* | sed -e 's/([^()]*)//g' 
done

This command deletes the link and title with () but the [Example Text] remains. I search for a way to remove [] so that at the end I only get Example Text.

Now:

[Example Text](http://example.com "Example Title")

Desired result:

Example Text
tripleee
  • 175,061
  • 34
  • 275
  • 318
Silvio
  • 123
  • 1
  • 9

2 Answers2

1

The immediate fix is to extend your sed regex.

sed 's/\[\([^][]*\)\]([^()]*)/\1/g'

But probably a much better fix is to replace all the lines from the Awk script in content in a single go.

find content -type f -exec \
    sed -i 's%\[\([^][]*\)\('"$(
        awk 'NR>1 { printf "\|" }
            { printf "%s", $7 }' "$ext")"'\)%\1%g'

The Awk script produces a long regex like

http://one.example.net/nosuchpage\|http://two.exampe.org/404\|https://three.example.com/broken-link

from all the links in the input, and the sed script then replaces any links which match this regex in the parentheses after the square brackets. (Maybe you'll want to extend this to also permit a quoted string after the link before the closing round parenthesis, like in your example; I feel I am already quessing too many things about what you are actually hoping to accomplish.)

If you are on a *BSD platform (including MacOS) you'll need to add an empty string ar[ument after the -i argument, like sed -i '' 's%...

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • This will remove other non-linked markdown as well, for example [username](mention:[uid]) – Oliver Dixon May 22 '22 at 14:43
  • @OliverDixon Not sure what you mean? Your example is not valid Markdown. (The script had a couple of pesky typos which I fixed now, though. I guess I must have written it on mobile originally.) – tripleee May 23 '22 at 04:55
1

Assumptions

  • The i in for i in "${link[@]}" will evaluate to be a link like "http://example.com" each loop
  • The format of every section in your markdown file we care about will take on the form you described [Example Text](http://example.com "Example Title")

The code

IFS=$'\n'
link=( $(awk '{print $7}' $ext) )

for i in "${link[@]}"; do
    grep -ro "\[.*\].*${i}" content/* | grep -o '\[.*\]' | tr -d '[]'
done

Explanation

  • grep -ro "\[.*\].*${i}" content/*:
    • Recursive search to run on all files in a dir: grep -r ... content/*
    • Print only the text that applies to our regex: grep -o
    • Print anything that starts with [ followed by anything .* then a ] followed by the value of our loop variable ${i} (The current link): "\[.*\].*${i}"
  • From that output all we want is "Example Text" which lives between the brackets, so anything not between brackets needs to go grep -o '\[.*\]'
  • Finally, we want to remove those pesky brackets: tr -d '[]'
Lenna
  • 1,220
  • 6
  • 22