2

I have string like below

case1:
str = "type=\"text/xsl\" href=\"http://skdjf.sdjhshf/CDA0000=.xsl\""
case2:
str = "href=\"http://skdjf.sdjhshf/CDA0000=.xsl\" type=\"text/xsl\""

I need to extract the values like

 type -> text/xsl
 href -> http://skdjf.sdjhshf/CDA0000=.xsl

Here is my regular expression that fails.

 str.match(/type="(.*)"/)[1]
 #this works in second case
 =>"text/xsl"

 str.match(/http="(.*)"/)[1]
 #this works in first case
 =>"http://skdjf.sdjhshf/CDA0000=.xsl"

In failure cases the whole string is matched.

Any idea?

Soundar Rathinasamy
  • 6,658
  • 6
  • 29
  • 47
  • 2
    It looks like you are parsing XML. Generally it is a good idea to use a library designed for that purpose. Is there a particular reason you can't or won't do that? – John Watts Oct 25 '12 at 10:36
  • Yes. I am using Nokogiri. But Nokogiri only gives string for stylsheet nodes. So that only I am looking for regular expression. – Soundar Rathinasamy Oct 25 '12 at 10:40
  • Nokogiri does everything, not only css. – oldergod Oct 25 '12 at 10:41
  • @oldergod Could you please take a look at this question. So that you can understand the problem. http://stackoverflow.com/questions/13066231/how-to-retrieve-the-nokogiri-processing-instruction-attributes – Soundar Rathinasamy Oct 25 '12 at 10:44

1 Answers1

3

Agree with John Watts comment. Use something like nokogiri to parse XML - it is a breeze. If you still want to stick with regex parsing you could do something like:

str.split(' ').map{ |part| part.match( /(.+)="(.+)"/ )[1..2] }

and you will get results as below:

> str = "type=\"text/xsl\" href=\"http://skdjf.sdjhshf/CDA0000=.xsl\""
 => "type=\"text/xsl\" href=\"http://skdjf.sdjhshf/CDA0000=.xsl\"" 

> str2 = "href=\"http://skdjf.sdjhshf/CDA0000=.xsl\" type=\"text/xsl\""
 => "href=\"http://skdjf.sdjhshf/CDA0000=.xsl\" type=\"text/xsl\"" 

> str.split(' ').map{ |part| part.match( /(.+)="(.+)"/ )[1..2] }
 => [["type", "text/xsl"], ["href", "http://skdjf.sdjhshf/CDA0000=.xsl"]] 

> str2.split(' ').map{ |part| part.match( /(.+)="(.+)"/ )[1..2] }
 => [["href", "http://skdjf.sdjhshf/CDA0000=.xsl"], ["type", "text/xsl"]] 

that you can put in a hash or wherever wou want to have it.

With nokogiri you can get hold of a node and then do something like node['href'] in your case. Probably much easier.

froderik
  • 4,642
  • 3
  • 33
  • 43
  • I solved this by http://stackoverflow.com/questions/3542264/can-nokogiri-search-for-xml-stylesheet-tags#answer-12223360. Thanks for the quick reply. – Soundar Rathinasamy Oct 25 '12 at 11:50