0

I'm writing a Markdown parser with ES6:

Input:

# Title

* * *

Paragraph

Another paragraph

Sample code:

// create a find/replace based on a regex and replacement 
// to later be used with the conversion functions
function massReplace(text, replacements) {
  let result = text
  for (let [regex, replacement] of replacements) {
    result = result.replace(regex, replacement)
  }
  return result
}

// match text with # and replace them for html headings
function convertHeadings(text, orig) {
  if (orig.match(/^#{1,6}\s/)) {
    return massReplace(text,
       [/^### (.*)/gm,    '<h3>$1</h3>'],
       [/^## (.*)/gm,     '<h2>$1</h2>'],
       [/^# (.*)/gm,      '<h1>$1</h1>'] ]
    )
  }
}

// match text without # and surround them with p tags
function convertParagraphs(text, orig) {
  if (!orig.match(/^#{1,6} (.*)/)) {
    return `<p>${text}</p>`
  }
}

// take the source, split on new lines, make a copy (to 
// have a "clean" version to be used in the if statements),
// and finally apply the conversion functions to them with
// the help of a loop and excluding those that output undefined
function convertToHTML(markdownSource) {
  let data = markdownSource.split('\n\n')
    , orig = data.slice()
    , conversions = [ convertHeadings, convertParagraphs]

  for (let i = 0, l = orig.length; i < l; ++i) {
    for (let conversion of conversions) {
      let result = conversion(data[i], orig[i])
      if (result !== undefined) {
        data[i] = result
      }
    }
  }

  return data.join('\n\n')
}

What I want now is to wrap p tags with the class no-indent around text that has * * * preceding it (Paragraph in the example above). The problem is, I don't know how to get a text based on its preceding one (* * * in this case).

To give an idea this is the desired output:

<h1>Title</h1>

<p>* * *</p>

<p class="no-indent">Paragraph</p>

<p>Another paragraph</p>
alexchenco
  • 53,565
  • 76
  • 241
  • 413
  • can you describe your quesion moe.how is functions `massReplace` and `convertParagraphs` called. and it's better to write your input as HTML – Omar Elawady Apr 10 '15 at 15:00
  • 1
    @Omar Elawady How about now? I don't understand why you suggest to write the input in HTML though, since the real input should be in Markdown. Anyhow, I included the HTML part as output. – alexchenco Apr 10 '15 at 15:10
  • (@omar seems a bit lost... :) ) Upvoted your Q. It's a really good formatted one! Thumbs up – Roko C. Buljan Apr 10 '15 at 15:15
  • 1
    _“The problem is, I don't know how to get a text based on its preceding one”_ – don’t try to look _forward_, look _back_ instead. When you encounter the `Paragraph` line in your input data, consult the variable that you stored the value of the _previous_ line into. – CBroe Apr 10 '15 at 15:23
  • You could perhaps look at parsers and how they are coded, however otherwise would a split line regex do the trick http://stackoverflow.com/questions/1979884/how-to-use-javascript-regex-over-multiple-lines – user5321531 Apr 10 '15 at 22:13

1 Answers1

1

You're asking a question about tokenising and parsing, in particular possibly look ahead parsing:

Wikipedia pages:
https://en.wikipedia.org/wiki/Lexical_analysis
https://en.wikipedia.org/wiki/Parsing

StackOverflow tokenizing:
https://stackoverflow.com/questions/tagged/token
https://stackoverflow.com/questions/tagged/tokenize

StackOverflow parsing questions:
https://stackoverflow.com/questions/tagged/parsing

Community
  • 1
  • 1
user5321531
  • 3,095
  • 5
  • 23
  • 28