0

Some Markdown processors require a blank line before a bulleted list.

So the list

This is a bulleted list: 
- line 1
- line 2
- line 3

Will be incorrectly rendered as

This is a bulleted list: - line 1 - line 2 - line 3

How can I use awk to check that lines starting with "- " are preceded by either a blank line or another line starting with "- "?

(I am using dash-space as the start token to avoid confusion with the document front matter, which uses three dashes as a separator.)

Please note: unlike the first question referenced in the comments, this is a search for an anti-pattern - something not occurring in the file.

Using pcregrep this is straightforward:

pcregrep -Mc '^[A-Z][a-z].*\n- ' $filename 

but it's not clear to me how to do it using awk.

What I'm doing is:

awkcommand='
/- / {
  if(lastLine != "") {
    print FILENAME
    exit
  }
}

{ lastLine = $0 }
'
awk "$awkcommand" data

which catches a single bullet following a non blank line. But when I try to add more conditions (if prior line not blank AND prior line does not start with a bullet), it fails - so for example, this:

  if(lastLine != "" && lastline !~/^- /) {

does not work: it gives a false positive on this file

This is a test

- abc
- def
Scott C Wilson
  • 19,102
  • 10
  • 61
  • 83
  • 2
    it can be simplified for one-liners, but the accepted answer here explains it well: https://stackoverflow.com/questions/14350856/can-awk-patterns-match-multiple-lines – Sundeep Jun 06 '20 at 13:55
  • This is the opposite of the question people are saying it duplicates it. We're looking for an anti-pattern, not a pattern. – Scott C Wilson Jun 06 '20 at 19:06
  • you can use logical operators to get the desired result.. I didn't vote to close the question, but the probable reason you are downvoted and not answered is because you haven't added you own efforts to the question – Sundeep Jun 07 '20 at 02:27
  • Your awk works, just tested, including `if(lastLine != "" && lastline !~/^- /) {` Do you get different output than the filename printed? – thanasisp Jun 07 '20 at 12:38
  • See sample data where it gives a false positive. – Scott C Wilson Jun 07 '20 at 12:41
  • 1
    Seems it is just a typo. Replace `lastline` with `lastLine` – thanasisp Jun 07 '20 at 12:49

2 Answers2

1
gawk '/^-/{ if (bl==pbl) bl=1; else { pbl=bl; bl=0; }}
    { if (bl==1) print ""; 
      print $0 }'  inputfile

Given the input file:

This is a bulleted list:
- line 1
- line 2
- line 3

The output will be:

This is a bulleted list:

- line 1
- line 2
- line 3

EDIT:

If you just want to print the filename:

awk '/^-/{ if (bl==pbl) bl=1; else { pbl=bl; bl=0; }}
    { if (bl==1) print FILENAME; }'    inputfiles*

In this case no special things are used that need gawk, so it should work with awk also.

Luuk
  • 12,245
  • 5
  • 22
  • 33
1

Your script is OK, there was only a typo in your additional if, it works.

awk '/- / {
  if(lastLine != "" && lastLine !~ /^- /) {
    print FILENAME
    exit
  }
}
{ lastLine = $0 }' file
thanasisp
  • 5,855
  • 3
  • 14
  • 31
  • For anyone who might want to use this, my final check was: `if(lastLine != "" && lastLine !~/^- / && lastLine !~/^[ ]+- / && lastLine !~/^#/ && lastLine != "```") {` – Scott C Wilson Jun 07 '20 at 13:41