0

Assume that I have a string which contains,

Some content blah blah blah
Some more random content
ParentID: Here goes the important content

I am trying to write a regex in ruby, to parse the value of "ParentID:" out of this string, this is what I have now,

def parseForParent(textForParsing)

  string1 = textForParsing.match(/ParentID:([^\/.]*)\n$/)

end

This problem seems to be resolved now, check for answers below, I am trying to modify the regex so that I can limit the text that is actually related to 'ParentID'. One way of doing it is to strip all further text beyond a delimiter, or I could incorporate it in my regex.

Rohan Dalvi
  • 1,215
  • 1
  • 16
  • 38

3 Answers3

1

You must be doing something odd because it works, though I've made some changes here to make it more Ruby styled:

def parse_for_parent(text)
  match = text.match(/ParentID:([^\/.]*?)\n$/)

  match and match[1]
end

text = <<END
Some content blah blah blah
Some more random content
ParentID: Here goes the important content
END

parse_for_parent(text)
# => " Here goes the important content"

As a note, method names in Ruby are defined with underscores. Class names are mixed case. Constants are all-caps.

tadman
  • 208,517
  • 23
  • 234
  • 262
  • I am stil getting the same undefined method `match' for nil:NilClass, is it because the function is returning nil? It is still not extracting the substring. – Rohan Dalvi Sep 25 '13 at 17:55
  • 2
    @RohanDalvi: Insert `puts textForParsing.inspect` before the match line. You will see that the value you are passing to the method is `nil`. Are you by any chance reading the string from a file? The read might be incomplete. (Hint: http://stackoverflow.com/questions/5545068/what-are-all-the-common-ways-to-read-a-file-in-ruby) – kristinalim Sep 25 '13 at 18:05
  • @kristinalim You are right, that was the problem. Now, the issue is with the regex, so for example, I am getting this output: ParentID: Transactional NAS

    Acceptance Criteria:

    I can verify that after changing Ideally, the output should have ended by the end of NAS since there is a
    tag ahead but it prints everything, so maybe I should use a regex that stops at all "
    "tags
    – Rohan Dalvi Sep 25 '13 at 18:18
  • Posted an answer. Kindly update your question to reflect the new details. – kristinalim Sep 25 '13 at 18:41
  • Your regular expression needs to be adjusted to avoid those problems. You didn't have any HTML in your example. – tadman Sep 25 '13 at 18:41
  • It's the `\n` at the end of the regex. That forces it to pick up everything up to the last `\n` (because it's greedy) – pguardiario Sep 26 '13 at 02:26
  • @pguardiario That's probably it. The original expression worked in my test here, but I've adjusted it. – tadman Sep 26 '13 at 14:40
1

How is this using str[regexp, capture]?

text = <<END
Some content blah blah blah
Some more random content
ParentID: Here goes the important content
END

text[/ParentID:(?<match>.*)/,"match"]
# => " Here goes the important content"
Arup Rakshit
  • 116,827
  • 30
  • 260
  • 317
0

If all you are dealing with are BR HTML tags, you can simplify parsing by replacing the BR tags in your input into plain text newlines before feeding it into your parseForParent method:

converted_text = text.gsub(/<br\s*\/?>/i, "\n")

That should be flexible enough to handle <BR>, <br/>, and <br />.

Update:

As @tadman emphasized, it is generally safer to use a full-blown HTML parser (e.g. Nokogiri) to handle parsing. This thread might also be of interest to you.

Community
  • 1
  • 1
kristinalim
  • 3,459
  • 18
  • 27
  • 2
    This would be better done with a proper HTML parser like Nokogiri. Regular expressions can only pretend to understand HTML tags. – tadman Sep 25 '13 at 18:42
  • @kristinalim I can't use "\n" instead of "
    " because I am feeding this in the form of a csv to a system and that system doesn't format "\n" only "
    "
    – Rohan Dalvi Sep 26 '13 at 00:24