1

I'm very new to sed bash command, so trying to learn.

I'm currently faced with a few thousand markdown files i need to clean up and I'm trying to create a command that deletes part of the following

# null 864: Headline
body text

I need anything that come before the headline deleted which is '# null 864: ' it's allways: '# null ' then some digits ': ' I'm using gnu-sed because I'm using mac

The best I've come up with sofar is

gsed -i '/#\snull\s([1-9]|[1-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9][0-9][0-9]):\s/d' *.md

The above does not seem to work?

however if I do

gsed -i '/#\snull/d' *.md

it does what I want, however it does some unintended stuff in the body test.

How do I control so only the headline and the body text remains?

The fourth bird
  • 154,723
  • 16
  • 55
  • 70

5 Answers5

1

Considering that you want to print values before headline and don't want to print any other lines, then try following.

sed -E -n 's/^(#\s+null\s+[0-9]+:\s+)Headline/\1/p' Input_file

In case you want to print value before Headline and if match is not found want to print that complete line then try following:

sed -E 's/^(#\s+null\s+[0-9]+:\s+)Headline/\1/' Input_file

Explanation: Simple using -E option of sed to enable ERE(extended regular expression), then using s option of sed to perform substitution here. matching # followed by space(s) null followed by space(s) digits colon and space(s) and keeping it in 1st capturing group, while substitution, substituting it with 1st capturing group.

NOTE: Above commands will print values on terminal, in case you want to save them inplace then use -i option once you are satisfied with above code's output.

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
1

If I'm understanding correctly, you have files like this:

This should get deleted
This should too.
# null 864: Headline
body text
this should get kept

You want to keep the headline, and everything after, right? You can do this in awk:

awk '/# null [0-9]+:/,eof {print}' foo.md
Andy Lester
  • 91,102
  • 13
  • 100
  • 152
1

You might use awk, and replace the # null 864: part with an empty string using sub.

See this page to either create a new file, or to overwrite the same file.

The }1 prints the whole line as 1 evaluates to true.

awk '{sub(/^# null [0-9]+:[[:blank:]]+/,"")}1' file

The pattern matches

  • ^# null Match literally from the start of the string
  • [0-9]+:[[:blank:]]+ match 1+ digits, then : and 1+ spaces

Output

Headline
body text
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
0

On a mac ed should be installed by default so.

The content of script.ed

g/^# null [[:digit:]]\{1,\}: Headline$/s/^.\{1,\}: //
,p
Q

for file in *.md; do ed -s "$file" < ./script.ed; done

If the output is ok, remove the ,p and change the Q to w so it can edit the file in-place

g/^# null [[:digit:]]\{1,\}: Headline$/s/^.\{1,\}: //
w

Run the loop again.

Jetchisel
  • 7,493
  • 2
  • 19
  • 18
0

I'd use a range in sed same as Andy Lester's awk solution.
Borrowing his infile,

$: cat tst.md
This should get deleted
This should too.
# null 864: Headline
body text
this should get kept

$: sed -Ein '/^# null [0-9]+:/,${p;d};d;' tst.md
$: cat tst.md
# null 864: Headline
body text
this should get kept
Paul Hodges
  • 13,382
  • 1
  • 17
  • 36