I would like to parse a content file from a static site generator using python3. Such files can have frontmatter in json, yaml or toml at the beginning of the file and after that the content; It's easy to get the frontmatter if it is yaml or toml, because those start end end with a specific string (--- or +++). Is there a way to get the json object from the beginning of the file into a python json object and the content that is the rest of the file into a string?
here is an example of a file based on the frontmatter example of the hugo static site generator:
{
"categories": [
"Development",
"VIM"
],
"date": "2012-04-06",
"description": "spf13-vim is a cross platform distribution of vim plugins and resources for Vim.",
"slug": "spf13-vim-3-0-release-and-new-website",
"tags": [
".vimrc",
"plugins",
"spf13-vim",
"vim"
],
"title": "spf13-vim 3.0 release and new website"
}
# Et sed pronos letum minatur
## Hos promissa est induit ductae non tamen
Lorem markdownum est, peragentem nomine fugaeque terruit ista quantum constat
vicinia. Per lingua concita. *Receptus Sibylla* frustra, genitor praesensque
texta vitiatis traxere cum natura feram ducunt terram.
based on the answer to Python Regex to match YAML Front Matter I got this:
matches = re.search(r'^\s*(\{.*\})\s*$(.*)', content, re.DOTALL|re.MULTILINE)
and that basically works, but there could be a another closing curly bracket in the text part below the json part on the beginning of a line- and it doesn't cope with nested json objects