I am trying to use a commandline program to split a larger text file into chunks with:
- split on defined regex pattern
- filenames defined by a capturing group in that regex pattern
The text file is of the format:
# Title
# 2020-01-01
Multi-line content
goes here
# 2020-01-02
Other multi-line content
goes here
Output should be these two files with the following filenames and contents:
2020-01-01.md ↓
# 2020-01-01
Multi-line content
goes here
2020-01-02.md ↓
# 2020-01-02
Other multi-line content
goes here
I can't seem to get all the criteria right.
The regex pattern to split on (separator) is simple enough, something along the lines of ^# (2020-.*)$
Either I can't set up a multi-line regex pattern that goes over \n
newlines and stops at the next occurrence of the separator pattern.
Or I can split with csplit
on the regex pattern, but I can't name the files with what is captured in (2020-.*)
Same for awk split()
or match()
, can't get it to work entirely.
I'm looking for a general solution, with the parameter being the regex patterns that define the chunk beginnings (eg. # 2020-01-01
) and endings (eg. the next date heading # 2020-01-02
or EOF
)