Parse and transform text for articles reproduction

Question

I have an input string like this one:

If you {decided|planned|wish} {to go|gonna} to {camping|have outdoor rest|fishing|hunting}, you {may like|need|just need|may use} sleeping bag [PRODUCT NAME]. {It|This sleeping bag} {is intended|is ideal} for [SEASON] and {designed|sewed|made} by [TYPE] {type|form-factor}.

Now, I need to do this things:

Put values into square brackets (ex. [PRODUCT NAME] become Hard Wear Mountain)
Take a random words from curly brackets and paste it (ex. {decided|planned|wish} become planned}

So, output string would be like this one:

If you wish go to fishing, you may like sleeping bag Hard Wear Mountain. This sleeping bag is ideal for winter season and designed by cocoon form-factor.

I know how to resolve #1 problem, but but have on idea about problem #2. Also, there can be nested square brackets, for ex: {some word|{some word2|{some word3|some word5}}|some word4}.

So I need an regular expression for Ruby, or maybe another approach to solve this problem.

http://stackoverflow.com/questions/6331065/matching-balanced-parenthesis-in-ruby-using-recursive-regular-expressions-like-p — , Apr 23 '15 at 04:42

Cary Swoveland · Accepted Answer · 2015-05-18T20:53:19.603

Suppose this is our text:

text =

'If you {decided|planned|wish} {to go|gonna} to {camping|have outdoor rest|fishing|hunting}, you {may like|need|just need|may use} sleeping bag [PRODUCT NAME]. {It|This sleeping bag} {is intended|is ideal} for [SEASON] and {designed|sewed|made} by [TYPE] {type|form-factor}. {It is|{really|{not so|all that}}|certainly} a great bag.'

Notice I've added some nested braces in the last sentence.

First, obtain the replacements as specified by a hash:

h = { '[PRODUCT NAME]'=>'Hard Wear Mountain',
      '[SEASON]'=>'fall',
      '[TYPE]'=>'underpaid workers' }

as follows:

r = /
    \[  # match a left bracket
    .+? # match >= 1 characters non-greedily (stop at 1st right bracket)
    \]  # match right bracket
    /x

str = text.gsub(r,h)

returning:

"If you {decided|planned|wish} {to go|gonna} to {camping|have outdoor rest|fishing|hunting}, you {may like|need|just need|may use} sleeping bag Hard Wear Mountain. {It|This sleeping bag} {is intended|is ideal} for fall and {designed|sewed|made} by underpaid workers {type|form-factor}. {It is|{really|{not so|all that}}|certainly} a great bag."

Every string s = [...] is replaced by h[s] if h has a key s; else no replacement is made.

Now do the replacements, beginning with the inner {...|...|...} and then working outward until no more replacements are made:

old = str  

loop do
  new = old.gsub(/\{[^{]+?(?:\|[^{}]+?)+\}/) do |s|
        a = s[1..-2].split('|')
        a[rand(a.size)]
  end
  break if new==old
  old=new 
end
old

returning:

"If you decided gonna to fishing, you need sleeping bag Hard Wear Mountain. This sleeping bag is intended for fall and sewed by underpaid workers form-factor. It is a great bag."

The idea here is to do a sequence of replacements, each time of strings of the form '{...|...|... }' where the ...'s don't contain a left bracket, and therefore do not contain a nested block. To show the steps, the following shows the sequential random replacements (which may of course be different than what I have above).

1st round of replacements

str # as above
old = str  
new = old.gsub(/\{[^{]+?(?:\|[^{}]+?)+\}/) do |s|
        a = s[1..-2].split('|')
        a[rand(a.size)]
      end
new==old #=> false

Now new equals:

"If you planned gonna to hunting, you just need sleeping bag Hard Wear Mountain. It is ideal for fall and made by underpaid workers type. {It is|{really|all that}|certainly} a great bag."

Notice that all the non-nested brace-blocks have been resolved, and the nested block:

{It is|{really|{not so|all that}}|certainly}

has been reduced in nesting levels by one:

{It is|{really|all that}|certainly}

as {not so|all that} has been replaced by all that. The random replacement in this block was done as follows:

 s0 = '{not so|all that}'
 s1 = s0[1..-2]
   #=> "not so|all that" 
 a  = s1.split('|')
   #=> ["not so", "all that"] 
 a[rand(a.size)]
   #=> a[rand(2)] => a[1] => "all that"

2nd round of replacements

old=new 
new = old.gsub(/\{[^{]+?(?:\|[^{}]+?)+\}/) do |s|
        a = s[1..-2].split('|')
        a[rand(a.size)]
      end
new==old #=> false

new now equals:

"If you planned gonna to hunting, you just need sleeping bag Hard Wear Mountain. It is ideal for fall and made by underpaid workers type. {It is|all that|certainly} a great bag."

3rd round of replacements

old=new 
new = old.gsub(/\{[^{]+?(?:\|[^{}]+?)+\}/) do |s|
        a = s[1..-2].split('|')
        a[rand(a.size)]
      end
new==old #=> false

new now equals:

"If you planned gonna to hunting, you just need sleeping bag Hard Wear Mountain. It is ideal for fall and made by underpaid workers type. certainly a great bag."

We are now finished, but won't know until we try again and find that new == old #=> true.

4th round of replacements

old=new 
new = old.gsub(/\{[^{]+?(?:\|[^{}]+?)+\}/) do |s|
        a = s[1..-2].split('|')
        a[rand(a.size)]
      end
new==old #=> true

Thank you very much for your reply. One question about replacement - can it do the same with nested brackets? Also, tested your code right now, but any time I run loop, it returns the same output string. Why they are not random? Regards. — Mark Pegasov, Apr 23 '15 at 06:02
In looping over the random replacements I made the assignment `str = s`. I suspect my reuse of the variable `str` may explain why you keep getting the same output string. I believe that would happen if you did not reinitialize `str` each time, as it would start with a string that contained no brace-blocks to replace. You'll see I've changed the code so that I no longer modify `str` after calculating it initially. Let me know if that fixes the problem. I'll consider nested brackets in the morning. Can you provide an example of how you would intend to use them? — Cary Swoveland, Apr 23 '15 at 07:10

score 0 · Answer 2 · answered Apr 23 '15 at 05:37

0

The following regex will capture the text, also for nested cases:

(?<=[|{])([\w\s]+?)(?=[}|])

You can then determine the number of matches and choose a random index less than the match group size.

answered Apr 23 '15 at 05:37

blueygh2

1,538
10
15

Parse and transform text for articles reproduction

2 Answers2