10

I want to split a command line like string in single string parameters. How look the regular expression for it. The problem are that the parameters can be quoted. For example like:

"param 1" param2 "param 3"

should result in:

param 1, param2, param 3

Horcrux7
  • 23,758
  • 21
  • 98
  • 156

15 Answers15

20

You should not use regular expressions for this. Write a parser instead, or use one provided by your language.

I don't see why I get downvoted for this. This is how it could be done in Python:

>>> import shlex
>>> shlex.split('"param 1" param2 "param 3"')
['param 1', 'param2', 'param 3']
>>> shlex.split('"param 1" param2 "param 3')
Traceback (most recent call last):
    [...]
ValueError: No closing quotation
>>> shlex.split('"param 1" param2 "param 3\\""')
['param 1', 'param2', 'param 3"']

Now tell me that wrecking your brain about how a regex will solve this problem is ever worth the hassle.

  • 1
    I agree. This would be a better solution, especially if you need to put quotes inside the string: "param""1" param2... – rslite Oct 13 '08 at 11:46
  • +1 - like parsing XML, this is not a good problem for regexes. – slim Oct 13 '08 at 11:48
  • 8
    Absolute nonsense. This is a simple problem for regexes, and it has nothing in common with parsing XML. –  Oct 13 '08 at 12:19
  • `shelx` must be the answer, agree with hop ! so true ! – shahjapan Aug 31 '11 at 08:40
  • 1
    In my case it's worth, because: I need that in some psql script, and the alternative would be a hassle with plpgsql. – philipp May 02 '16 at 10:04
7

I tend to use regexlib for this kind of problem. If you go to: http://regexlib.com/ and search for "command line" you'll find three results which look like they are trying to solve this or similar problems - should be a good start.

This may work: http://regexlib.com/Search.aspx?k=command+line&c=-1&m=-1&ps=20

Sam Meldrum
  • 13,835
  • 6
  • 33
  • 40
7
("[^"]+"|[^\s"]+)

what i use C++

#include <iostream>
#include <iterator>
#include <string>
#include <regex>

void foo()
{
    std::string strArg = " \"par   1\"  par2 par3 \"par 4\""; 

    std::regex word_regex( "(\"[^\"]+\"|[^\\s\"]+)" );
    auto words_begin = 
        std::sregex_iterator(strArg.begin(), strArg.end(), word_regex);
    auto words_end = std::sregex_iterator();
    for (std::sregex_iterator i = words_begin; i != words_end; ++i)
    {
        std::smatch match = *i;
        std::string match_str = match.str();
        std::cout << match_str << '\n';
    }
}

Output:

"par   1"
par2
par3
"par 4"
6

Without regard to implementation language, your regex might look something like this:

("[^"]*"|[^"]+)(\s+|$)

The first part "[^"]*" looks for a quoted string that doesn't contain embedded quotes, and the second part [^"]+ looks for a sequence of non-quote characters. The \s+ matches a separating sequence of spaces, and $ matches the end of the string.

Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
  • 1
    Failure case for your regex: <" " param2 "" bozo "ninny" "param 3"> Notice 1) quotes left in the answer 2) includes trailing whitespace after bozo. Probably has other bugs too. –  Oct 13 '08 at 11:52
  • Sorry these comments trim out whitespace - there should be lots of spaces after "bozo". –  Oct 13 '08 at 11:54
4

Regex: /[\/-]?((\w+)(?:[=:]("[^"]+"|[^\s"]+))?)(?:\s+|$)/g

Sample: /P1="Long value" /P2=3 /P3=short PwithoutSwitch1=any PwithoutSwitch2

Such regex can parses the parameters list that built by rules:

  • Parameters are separates by spaces (one or more).
  • Parameter can contains switch symbol (/ or -).
  • Parameter consists from name and value that divided by symbol = or :.
  • Name can be set of alphanumerics and underscores.
  • Value can absent.
  • If value exists it can be the set of any symbols, but if it has the space then value should be quoted.

This regex has three groups:

  • the first group contains whole parameters without switch symbol,
  • the second group contains name only,
  • the third group contains value (if it exists) only.

For sample above:

  1. Whole match: /P1="Long value"
    • Group#1: P1="Long value",
    • Group#2: P1,
    • Group#3: "Long value".
  2. Whole match: /P2=3
    • Group#1: P2=3,
    • Group#2: P2,
    • Group#3: 3.
  3. Whole match: /P3=short
    • Group#1: P3=short,
    • Group#2: P3,
    • Group#3: short.
  4. Whole match: PwithoutSwitch1=any
    • Group#1: PwithoutSwitch1=any,
    • Group#2: PwithoutSwitch1,
    • Group#3: any.
  5. Whole match: PwithoutSwitch2
    • Group#1: PwithoutSwitch2,
    • Group#2: PwithoutSwitch2,
    • Group#3: absent.
23W
  • 1,413
  • 18
  • 37
2

Most languages have other functions (either built-in or provided by a standard library) which will parse command lines far more easily than building your own regex, plus you know they'll do it accurately out of the box. If you edit your post to identify the language that you're using, I'm sure someone here will be able to point you at the one used in that language.

Regexes are very powerful tools and useful for a wide range of things, but there are also many problems for which they are not the best solution. This is one of them.

Dave Sherohman
  • 45,363
  • 14
  • 64
  • 102
2

This will split an exe from it's params; stripping parenthesis from the exe; assumes clean data:

^(?:"([^"]+(?="))|([^\s]+))["]{0,1} +(.+)$

You will have two matches at a time, of three match groups:

  1. The exe if it was wrapped in parenthesis
  2. The exe if it was not wrapped in parenthesis
  3. The clump of parameters

Examples:

"C:\WINDOWS\system32\cmd.exe" /c echo this

Match 1: C:\WINDOWS\system32\cmd.exe

Match 2: $null

Match 3: /c echo this

C:\WINDOWS\system32\cmd.exe /c echo this

Match 1: $null

Match 2: C:\WINDOWS\system32\cmd.exe

Match 3: /c echo this

"C:\Program Files\foo\bar.exe" /run

Match 1: C:\Program Files\foo\bar.exe

Match 2: $null

Match 3: /run

Thoughts:

I'm pretty sure that you would need to create a loop to capture a possibly infinite number of parameters.

This regex could easily be looped onto it's third match until the match fails; there are no more params.

VertigoRay
  • 5,935
  • 6
  • 39
  • 48
1

there's a python answer thus we shall have a ruby answer as well :)

require 'shellwords'
Shellwords.shellsplit '"param 1" param2 "param 3"'
#=> ["param 1", "param2", "param 3"] or :
'"param 1" param2 "param 3"'.shellsplit
kares
  • 7,076
  • 1
  • 28
  • 38
1

If its just the quotes you are worried about, then just write a simple loop to dump character by character to a string ignoring the quotes.

Alternatively if you are using some string manipulation library, you can use it to remove all quotes and then concatenate them.

Sridhar Iyer
  • 2,772
  • 1
  • 21
  • 28
1

Though answer is not RegEx specific but answers Python commandline arg parsing:


import sys

def parse_cmd_args():
  _sys_args = sys.argv
  _parts = {}
  _key = "script"
  _parts[_key] = [_sys_args.pop(0)]

  for _part in _sys_args:
    # Parse numeric values float and integers
    if _part.replace("-", "1", 1).replace(".", "1").replace(",", "").isdigit():
      _part = int(_part) if '.' not in _part and float(_part)/int(_part) == 1 else float(_part)
      _parts[_key].append(_part)
    elif "=" in _part:
      _part = _part.split("=")
      _parts[_part[0].strip("-")] = _part[1].strip().split(",")
    elif _part.startswith(("-")):
      _key = _part.strip("-")
      _parts[_key] = []
    else:
      _parts[_key].extend(_part.split(","))

  return _parts

miRastic
  • 43
  • 5
0

Something like:

"(?:(?<=")([^"]+)"\s*)|\s*([^"\s]+)

or a simpler one:

"([^"]+)"|\s*([^"\s]+)

(just for the sake of finding a regexp ;) )

Apply it several time, and the group n°1 will give you the parameter, whether it is surrounded by double quotes or not.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
0

Here's a solution in Perl:

#!/usr/bin/perl

sub parse_arguments {
 my $text = shift;
 my $i = 0;
 my @args;
 while ($text ne '') {
  $text =~ s{^\s*(['"]?)}{};  # look for (and remove) leading quote
  my $delimiter = ($1 || ' ');  # use space if not quoted
  if ($text =~ s{^(([^$delimiter\\]|\\.|\\$)+)($delimiter|$)}{}) {
   $args[$i++] = $1;  # acquired an argument; save it
  }
 }
 return @args;
}

my $line = <<'EOS';
"param 1" param\ 2 "pa\"ram' '3" 'pa\'ram" "4'
EOS

say "ARG: $_" for parse_arguments($line);

Output:

ARG: param 1
ARG: param\ 2
ARG: pa"ram' '3
ARG: pa'ram" "4

Note the following:

  • Arguments can be quoted with either " or ' (with the "other" quote type treated as a regular character for that argument).
  • Spaces and quotes in arguments can be escaped with \.

The solution can be adapted to other languages. The basic approach is to (1) determine the delimiter character for the next string, (2) extract the next argument up to an unescaped occurrence of that delimiter or to the end-of-string, then (3) repeat until empty.

chrispitude
  • 121
  • 2
  • 6
0

If you are looking to parse the command and the parameters I use the following (with ^$ matching at line breaks aka multiline):

(?<cmd>^"[^"]*"|\S*) *(?<prm>.*)?

In case you want to use it in your C# code, here it is properly escaped:

try {
    Regex RegexObj = new Regex("(?<cmd>^\\\"[^\\\"]*\\\"|\\S*) *(?<prm>.*)?");

} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

It will parse the following and know what is the command versus the parameters:

"c:\program files\myapp\app.exe" p1 p2 "p3 with space"
app.exe p1 p2 "p3 with space"
app.exe
ymerej
  • 727
  • 1
  • 8
  • 21
-1
\s*("[^"]+"|[^\s"]+)

that's it

boqapt
  • 1,726
  • 2
  • 22
  • 31
-3

(reading your question again, just prior to posting I note you say command line LIKE string, thus this information may not be useful to you, but as I have written it I will post anyway - please disregard if I have missunderstood your question.)

If you clarify your question I will try to help but from the general comments you have made i would say dont do that :-), you are asking for a regexp to split a series of parmeters into an array. Instead of doing this yourself I would strongly suggest you consider using getopt, there are versions of this library for most programming languages. Getopt will do what you are asking and scales to manage much more sophisticated argument processing should you require that in the future.

If you let me know what language you are using I will try and post a sample for you.

Here are a sample of the home pages:

http://www.codeplex.com/getopt (.NET)

http://www.urbanophile.com/arenn/hacking/download.html (java)

A sample (from the java page above)

 Getopt g = new Getopt("testprog", argv, "ab:c::d");
 //
 int c;
 String arg;
 while ((c = g.getopt()) != -1)
   {
     switch(c)
       {
          case 'a':
          case 'd':
            System.out.print("You picked " + (char)c + "\n");
            break;
            //
          case 'b':
          case 'c':
            arg = g.getOptarg();
            System.out.print("You picked " + (char)c + 
                             " with an argument of " +
                             ((arg != null) ? arg : "null") + "\n");
            break;
            //
          case '?':
            break; // getopt() already printed an error
            //
          default:
            System.out.print("getopt() returned " + c + "\n");
       }
   }
Scott James
  • 674
  • 1
  • 5
  • 11