It's probably possible to do in one regex pattern, but I am a believer in keeping the patterns simple. Regex can be insidious and hide lots of little errors. Keep it simple to avoid that, then tweak afterwards.
text = <<EOT
RAILS_ENV=production
listen_address = 127.0.0.1 # localhost only by default
PATH="/usr/local/bin"
EOT
text.scan(/^([^=]+)=(.+)/)
# => [["RAILS_ENV", "production"], ["listen_address ", " 127.0.0.1 # localhost only by default"], ["PATH", "\"/usr/local/bin\""]]
To trim off the trailing comment is easy in a subsequent map
:
text.scan(/^([^=]+)=(.+)/).map{ |n,v| [ n, v.sub(/#.+/, '') ] }
# => [["RAILS_ENV", "production"], ["listen_address ", " 127.0.0.1 "], ["PATH", "\"/usr/local/bin\""]]
If you want to normalize all your name/values so they have no extraneous spaces you can do that in the map
also:
text.scan(/^([^=]+)=(.+)/).map{ |n,v| [ n.strip, v.sub(/#.+/, '').strip ] }
=> [["RAILS_ENV", "production"], ["listen_address", "127.0.0.1"], ["PATH", "\"/usr/local/bin\""]]
What the regex "/^([^=]+)=(.+)/
" is doing is:
- "
^
" is "At the beginning of a line", which is the character after a "\n". This is not the same as the start of a string, which would be \A
. There is an important difference so if you don't understand the two it is a good idea to learn when and why you'd want to use one over the other. That's one of those places a regex can be insidious.
- "
([^=]+)
" is "Capture everything that is not an equal-sign".
- "
=
" is obviously the equal-sign we were looking for in the previous step.
- "
(.+)
" is going to capture everything after the equal-sign.
I purposely kept the above pattern simple. For production use I'd tighten up the patterns a little using some "non-greedy" flags, along with a trailing "$
" anchor:
text.scan(/^([^=]+?)=(.+)$/).map{ |n,v| [ n.strip, v.sub(/#.+/, '').strip ] }
=> [["RAILS_ENV", "production"], ["listen_address", "127.0.0.1"], ["PATH", "\"/usr/local/bin\""]]
+?
means find the first matching '='. It's already implied by the use of [^=]
but +?
makes that even more obvious to be my intent. I can get away without the ?
but it's more of a self-documentation thing for later maintenance. In your use-case it should be benign but is a worthy thing to keep in your Regex Bag 'o Tricks.
$
means the end-of-the-string, i.e., the place immediately preceding the EOL, AKA end-of-line, or carriage-return. It's implied also, but inserting it in the pattern makes it more obvious that's what I'm searching for.
EDIT to track the OP's added test:
text = <<EOT
RAILS_ENV=production
listen_address = 127.0.0.1 # localhost only by default
PATH="/usr/local/bin"
HOSTNAME=`cat /etc/hostname`
EOT
text.scan( /^ ( [^=]+? ) = ( .+ ) $/x ).map{ |n,v| [ n.strip, v.sub(/#.+/, '').strip ] }
=> [["RAILS_ENV", "production"], ["listen_address", "127.0.0.1"], ["PATH", "\"/usr/local/bin\""], ["HOSTNAME", "`cat /etc/hostname`"]]
If I was writing this for myself I'd generate a hash for convenience:
Hash[ text.scan( /^ ( [^=]+? ) = ( .+ ) $/x ).map{ |n,v| [ n.strip, v.sub(/#.+/, '').strip ] } ]
=> {"RAILS_ENV"=>"production", "listen_address"=>"127.0.0.1", "PATH"=>"\"/usr/local/bin\"", "HOSTNAME"=>"`cat /etc/hostname`"}