1

I have follwing script and the script outputs a array like below.

[
    [0] "# bash LAN\n\n```\nnmap -sn 192.168.3.*\n```\n",
    [1] "# node.js npm\n\n`sudo apt update && sudo apt install nodejs npm -y`\n",
    [2] "# something title\n\nsomething content\n",
    [3] "# bash log\n\n```\ntail -f",
    [4] "# or",
    [5] "# tail -f -n 50\n```\n\n\n"
]
Encoding.default_external = 'UTF-8'

require 'pry'
require "awesome_print"

# p \
ap \
arrayobj = <<-'EOS'.scan(/^#(?!#).*(?:\R(?!#(?!#)).*)*/) # .scan(/^#.*$\n(.*)/m)


# bash LAN

\```
nmap -sn 192.168.3.*
\```

# node.js npm

`sudo apt update && sudo apt install nodejs npm -y`

# something title

something content

# bash log

\```
tail -f
# or
# tail -f -n 50
\```


EOS

I'd like to split the EOS-enclosed text by #(single hash) but if the single hash is enclosed by Markdown code tag, ignore that single hash.

So, in that case, what I wanted is following output. How do you get the output?

[
    [0] "# bash LAN\n\n```\nnmap -sn 192.168.3.*\n```\n",
    [1] "# node.js npm\n\n`sudo apt update && sudo apt install nodejs npm -y`\n",
    [2] "# something title\n\nsomething content\n",
    [3] "# bash log\n\n```\ntail -f\n# or\n# tail -f -n 50\n```\n\n\n",
]
k23j4
  • 723
  • 2
  • 8
  • 15

2 Answers2

0

You may use

.scan(/^#(?!#)(?:(?!```)[^#]|```.*?```)*/m).flatten

See the Ruby demo and a Rubular demo.

Details

  • ^ - start of a line
  • #(?!#) - a # not followed with another #
  • (?:(?!```)[^#]|```.*?```)* - 0 or more repetitions of
    • (?!```)[^#] - any char other than # that does not start a ``` char sequence
    • | - or
    • ``` - three backticks
    • .*? - any 0+ chars, as few as possible
    • ``` - three backticks

The m modifier makes . match any char including line break chars.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

What I would do in this case is maybe not completely standard, but here you are:

  1. Split the string on triple quotes:

    .split('```')
    
  2. Take the pieces in pairs. First piece is normal markdown, second is code snippet.

    .each_slice(2)
    
  3. Add markers before # in the first piece, not in the second. Notice that the MD piece is nil in the last slice.

    .map { |txt, code = nil| [txt.gsub('#', "\x00#"), code].compact }
    
  4. Join back

    .flatten.join('```')
    
  5. Split by marker

    .split("\x00")
    

The byte '0' is not expected to be in your text, if you have it, just use another marker, like a long random string without any # that you generate when starting the process.

rewritten
  • 16,280
  • 2
  • 47
  • 50