0

I have a string such as:

<?xml version="xyzt" standalone="112.0" sxcx="xcxc"?>

I want to extract the string to array where each element is an attribute of string such as [version="xyzt", standalone="112.0", sxcx="xcxc"].

I tried using string.scan(/\s\w+="\.*"/) do |block| puts block end but I don't get a result.. Please tell me why and how I can do it.

Andrew Marshall
  • 95,083
  • 20
  • 220
  • 214
  • Well, that regex doesn't match anything in that string. Therefore, no output. – Sergio Tulentsev Jul 27 '16 at 12:39
  • I am preeeetty sure you don't want to match literal dot zero or more times. Use http://regex101.com, it's great. – Sergio Tulentsev Jul 27 '16 at 12:40
  • 3
    Please consider using an actual XML parser (e.g. [Nokogiri](http://www.nokogiri.org/)) instead of [parsing XML with regex](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/). – Andrew Marshall Jul 27 '16 at 12:42
  • 2
    Additionally, your example output array is not valid Ruby. – Andrew Marshall Jul 27 '16 at 12:43
  • Thank @SergioTulentsev – Đinh Văn Khang Jul 27 '16 at 12:44
  • Thank you @AndrewMarshall but i don't want use module – Đinh Văn Khang Jul 27 '16 at 12:48
  • Further to @Andrew's point about the output array being invalid, the first element of the array you've shown (`version="xyzt"`) implies that `version` is a local variable produced from the given string. Since Ruby v1.8 it has not been possible to create local variables. The question only makes sense if the first element of the array is the string `'version="xyzt"'`. – Cary Swoveland Jul 27 '16 at 16:55

2 Answers2

0
string[/(?<=\<\?xml ).*(?=\?>)/]
#⇒ 'version="xyzt" standalone="112.0" sxcx="xcxc"'

If you need to surround it with square brackets:

?[ << string[/(?<=\<\?xml ).*(?=\?>)/] << ?]
#⇒ '[version="xyzt" standalone="112.0" sxcx="xcxc"]'

To get a hash of attributes:

string[/(?<=\<\?xml ).*(?=\?>)/].split(/\s+/)
                                .map { |e| e.split('=') }
                                .to_h
#⇒ {
#  "standalone" => "\"112.0\"",
#        "sxcx" => "\"xcxc\"",
#     "version" => "\"xyzt\""
# }
Aleksei Matiushkin
  • 119,336
  • 10
  • 100
  • 160
0
str = '<?xml version="xyzt" standalone="112.0" sxcx="xcxc"?>'

I assume you want to produce the array:

['version="xyzt"', 'standalone="112.0"', 'sxcx="xcxc"']

You can do that as follows:

arr = str.scan(/[a-z]+\=\S+/)
  #=> ["version=\"xyzt\"", "standalone=\"112.0\"", "sxcx=\"xcxc\"?>"] 

puts arr
# version="xyzt"
# standalone="112.0"
# sxcx="xcxc"?>
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100