How do I remove a particular pattern with a number sequence sed

Question

I'm very new to sed bash command, so trying to learn.

I'm currently faced with a few thousand markdown files i need to clean up and I'm trying to create a command that deletes part of the following

# null 864: Headline
body text

I need anything that come before the headline deleted which is '# null 864: ' it's allways: '# null ' then some digits ': ' I'm using gnu-sed because I'm using mac

The best I've come up with sofar is

gsed -i '/#\snull\s([1-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]):\s/d' *.md

The above does not seem to work?

however if I do

gsed -i '/#\snull/d' *.md

it does what I want, however it does some unintended stuff in the body test.

How do I control so only the headline and the body text remains?

score 1 · Answer 1 · answered Jun 09 '21 at 19:37

Considering that you want to print values before headline and don't want to print any other lines, then try following.

sed -E -n 's/^(#\s+null\s+[0-9]+:\s+)Headline/\1/p' Input_file

In case you want to print value before Headline and if match is not found want to print that complete line then try following:

sed -E 's/^(#\s+null\s+[0-9]+:\s+)Headline/\1/' Input_file

Explanation: Simple using -E option of sed to enable ERE(extended regular expression), then using s option of sed to perform substitution here. matching # followed by space(s) null followed by space(s) digits colon and space(s) and keeping it in 1st capturing group, while substitution, substituting it with 1st capturing group.

NOTE: Above commands will print values on terminal, in case you want to save them inplace then use -i option once you are satisfied with above code's output.

score 1 · Answer 2 · answered Jun 09 '21 at 19:40

If I'm understanding correctly, you have files like this:

This should get deleted
This should too.
# null 864: Headline
body text
this should get kept

You want to keep the headline, and everything after, right? You can do this in awk:

awk '/# null [0-9]+:/,eof {print}' foo.md

The fourth bird · Answer 3 · 2021-06-09T20:15:27.367

You might use awk, and replace the # null 864: part with an empty string using sub.

See this page to either create a new file, or to overwrite the same file.

The }1 prints the whole line as 1 evaluates to true.

awk '{sub(/^# null [0-9]+:[[:blank:]]+/,"")}1' file

The pattern matches

^# null Match literally from the start of the string
[0-9]+:[[:blank:]]+ match 1+ digits, then : and 1+ spaces

Output

Headline
body text

score 0 · Answer 4 · answered Jun 09 '21 at 20:06

On a mac ed should be installed by default so.

The content of script.ed

g/^# null [[:digit:]]\{1,\}: Headline$/s/^.\{1,\}: //
,p
Q

for file in *.md; do ed -s "$file" < ./script.ed; done

If the output is ok, remove the ,p and change the Q to w so it can edit the file in-place

g/^# null [[:digit:]]\{1,\}: Headline$/s/^.\{1,\}: //
w

Run the loop again.

score 0 · Answer 5 · answered Jun 09 '21 at 22:17

I'd use a range in sed same as Andy Lester's awk solution.
Borrowing his infile,

$: cat tst.md
This should get deleted
This should too.
# null 864: Headline
body text
this should get kept

$: sed -Ein '/^# null [0-9]+:/,${p;d};d;' tst.md
$: cat tst.md
# null 864: Headline
body text
this should get kept

How do I remove a particular pattern with a number sequence sed

5 Answers5