Regex markdown string for key-value pairs

Question

My Rails app is retrieving data from a third-party service that doesn't allow me to attach arbitrary data to records, but does have a description area that supports Markdown. I am attempting to pass data for each record over to my Rails app within the description content, via Markdown comments:

[//]: # (POST:28|USERS:102,78,90)

... additional Markdown content.

I found the [//]: # (...) syntax in this answer for embedding comments in Markdown, and my idea was to then pass pipe-separated key-value pairs in as the comment content.

Using my example above, I would like to be able to parse the description content string to interpret the key-value pairs. In this case, POST=28 and USERS=102,78,90. If it helps, this comment will always appear at the first line of the Markdown content.

I imagine Regex is the way to go here? I would really appreciate any help!

I think your idea is good, but I suggest *not* using your own artisanal serialization format and instead using something standard like JSON after the `#`. It would simplify both the matching and the parsing significantly. — Jordan Running, Jul 26 '16 at 20:38

score 1 · Accepted Answer · answered Jul 26 '16 at 19:34

1

You can use \G:

(?:^\[//\]:[^(]+\(\K # match your token [//]: at the beginning
|                    
\G(?!\A)\|           # or right after the previous match
)
(\w+):([\w,]+)       # capture word chars (=key)
                     # followed by :
                     # followed by word chars and comma (=val)

See a demo on regex101.com.

answered Jul 26 '16 at 19:34

Jan

42,290
8
54
79

Can this type of Regexp be performed in a Ruby environment? I'm not sure how to enclose a regexp using `~`s – Jody Heavener Jul 26 '16 at 19:46
@JodyHeavener: Yup, just [**escape the slashes**](http://rubular.com/r/PxjfOnl1Hk) – Jan Jul 26 '16 at 19:52

score 0 · Answer 2 · answered Jul 26 '16 at 19:30

0

You'll need 2 steps to properly parse this:

First find the comment: ^\[\/\/\]: # \(([^)]*)\)

This captures the comment's content.

Then parse the content: (\w+):([^|]+) (with the global flag)

This captures the key and value separately.

answered Jul 26 '16 at 19:30

Whothehellisthat

2,072
1
14
14

There is really no reason to do this in two separate steps. – Jordan Running Jul 26 '16 at 20:18
Oh, I see how you're doing it from your answer. I'm a JS developer, which doesn't have that feature. Pretty cool! – Whothehellisthat Jul 26 '16 at 20:20

score 0 · Answer 3 · answered Jul 26 '16 at 20:55

As I mentioned in my comment above, you could simplify things a lot by using a standard data serialization format like JSON after the #. For example:

require "json"

MATCH_DATA_COMMENT = %r{(?<=^\[//\]: # ).*}

markdown = <<END
[//]: # {"POST":28,"USERS":[102,78,90]}

... additional Markdown content.
END

p JSON.parse(markdown[MATCH_DATA_COMMENT])
# => { "POST" => 28, "USERS" => [102, 78, 90] }

The regular expression %r{(?<=^\[//\]: # ).*} uses negative lookbehind to match anything that follows "[//]: #".

Regex markdown string for key-value pairs

3 Answers3