Original title: Keep newline character in string during gsub
There is a post, where I try to convert JSON to markdown unordered lists. It is almost done, but there is a pattern which I can not handle. If a string has a space, newline, space sequence in it, then it will be treated as the list item hyphen. If I try to avoid this using some reference to a newline character, then nothing works as I expect.
Input JSON: https://gist.github.com/hermanp/381eaf9f2bf5f2b9cdf22f5295e73eb5
Preferred output (two space indentation) markdown:
- Info
- Python
- The Ultimate Python Beginner's Handbook
- Python Like You Mean It
- Automate the Boring Stuff with Python
- Data science Python notebooks
- Frontend
- CodePen
- JavaScript - Wikipedia
- CSS-Tricks
- Butterick’s Practical Typography
- Front-end Developer Handbook 2019
- Using Ethics In Web Design
- Client-Side Web Development
- Stack Overflow
- HUP
- Hope in Source
To generate the markdown, I use the following two scripts:
generate_md()
library(jsonlite)
generate_md <- function (jsonfile) {
bmarks_json_lite <- fromJSON(txt = jsonfile)
level1 <- bmarks_json_lite$children$children[[2]]
markdown_result <- recursive_func(level = level1)
return(markdown_result)
}
recursive_func()
recursive_func <- function (level) {
md_result <- character()
for (i in seq_len(nrow(level))) {
if (level[i, "type"] == "text/x-moz-place"){
md_title <- paste0("- ", level[i, "title"], "\n")
} else if (level[i, "type"] == "text/x-moz-place-container") {
md_title <- paste0("- ", level[i, "title"], "\n")
md_recurs <- recursive_func(level = level[i, "children"][[1]])
# >>>>> This is the problematic part. <<<<<
md_recurs <- gsub("-(?= )", " -", md_recurs, perl = T)
md_title <- paste0(md_title, md_recurs)
}
md_result <- paste0(md_result, md_title)
}
return(md_result)
}
With these functions I can achieve the following (note the unnecessary spaces at the JavaScript Wikipedia entry). I want to get - JavaScript - Wikipedia
instead - JavaScript - Wikipedia
. I hope this example represents the different scenarios with hyphens and indentation, but still, this is just a fraction of my bookmarks. I wanted to give a minimal example.
cat(generate_md(paste0("https://gist.githubusercontent.com/hermanp/",
"381eaf9f2bf5f2b9cdf22f5295e73eb5/raw/",
"76b74b2c3b5e34c2410e99a3f1b6ef06977b2ec7/",
"bookmarks-example-hyphen.json")))
# Output
- Info
- Python
- The Ultimate Python Beginner's Handbook
- Python Like You Mean It
- Automate the Boring Stuff with Python
- Data science Python notebooks
- Frontend
- CodePen
- JavaScript - Wikipedia
- CSS-Tricks
- Butterick’s Practical Typography
- Front-end Developer Handbook 2019
- Using Ethics In Web Design
- Client-Side Web Development
- Stack Overflow
- HUP
- Hope in Source
I modified the gsub
function part in recursive_func
as seen below, without the desired output:
md_recurs <- gsub("-(?= )", " -", md_recurs, perl = T) # Original
md_recurs <- gsub("(\n)?-(?= )", " -", md_recurs, perl = T) # No newlines
md_recurs <- gsub("(-)(?= )(?<=\n)?", " -", md_recurs, perl = T) # Same as Original
Searching for regex newline before char gsub site:stackoverflow.com
on Google, I find no answer or hint to this question. I also played with regex101.com, but could not find the right path.