1

I need to mimic the operation of a terminal. So for example, if I have

  • "a quoted string" anotherParam I want to get ["a quoted string", "anotherParam"]
  • test\ folder somethingElse shoould become ["test folder", somethingElse]

How can I do it? I have been trying regex but cant seem to get it right ... the main problem being differenciating spaces, indicating next parameter, and escaped spaces which should become the same parameter

By the way, I can't use libraries only java provided classes

The current code uses

[^\s]+

So it considers anything thats not space a token, is it possible to include escaped spaces? Or perhaps theres a better way around this?

Jiew Meng
  • 84,767
  • 185
  • 495
  • 805
  • Why `test\ folder somethingElse` becomes `["test folder", somethingElse]`? – Maroun Jan 30 '14 at 09:24
  • 1
    Because the space is escaped. If you type `mkdir test\ folder` for example, it creates a folder named "test folder" – Jiew Meng Jan 30 '14 at 09:26
  • Try `/(?:"((\\"|[^"])*)"|((\S|\\ )+))/`. I also suggest to use a parser instead though. – Njol Jan 30 '14 at 09:30
  • @Njol, it seems I still get 2 matches for `test\ folder`? – Jiew Meng Jan 30 '14 at 09:32
  • I am thinking of preprocessing, by replacing `\ ` with something unlikely to occur in a real file path like `:#.!!:`, just something random then after I do the matching stuff, I replace instances of this with space. There appears to be no real invalid character in a linux filename? http://stackoverflow.com/questions/1311037/are-there-any-invalid-linux-filenames – Jiew Meng Jan 30 '14 at 09:34
  • 1
    @JiewMeng make sure to escape all backslashes if you use it as a Java string literal, e.g. use `\\\\ ` instead of `\\ `. – Njol Jan 30 '14 at 09:34
  • May I ask why you don't accept the answer you've got? –  Apr 04 '14 at 08:40

2 Answers2

0

One possibility that I successfully adopted for now unless I find something better is preprocess the input by replacing escaped spaces with something really unlikely to come up like :$p@c3: then after all the regex stuff/tokenizing, I replace that string with a space. It worked well so far

Jiew Meng
  • 84,767
  • 185
  • 495
  • 805
  • Hey, would you mind accepting my answer or letting me know whether it suits your needs? Thanks ;) – ccjmne Aug 06 '14 at 03:28
0

I think that you could use this regex to find your parameters:

"(((?!").)+)"|(\S|(?<=\\)\s)+

Which gives, once transcribed as a Java String:

"\"(((?!\").)+)\"|(\\S|(?<=\\\\)\\s)+"

This is how it works:

it matches "(((?!").)+)" or (\S|(?<=\\)\s)+

  1. "(((?!").)+)" matches anything surrounded by "
  2. (\S|(?<=\\)\s)+ matches anything that is composed of:
    2.1. non-space characters (\S) OR
    2.2. space characters (\s) if they're escaped (immediately preceded by \).

Running this regex against:

"a quoted string" anotherParam a\ third\ param

matches three times: once for "a quoted string", once for anotherParam and once for a\ third\ param.


Sample tested code:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
    public static void main(String[] args) {
        final Pattern p = Pattern.compile("\"(((?!\").)+)\"|((\\S|(?<=\\\\)\\s)+)");
        final String input = "\"a quoted string\" anotherParam a\\ third\\ param";

        final Matcher m = p.matcher(input);
        while(m.find()) {
            if(m.group(1) == null) {
                System.out.println(m.group().replace("\\ ", " "));
            } else {
                System.out.println(m.group(1)); // trimmed from the surrounding quotes
            }
        }
    }
}

output:

a quoted string
anotherParam
a third param
ccjmne
  • 9,333
  • 3
  • 47
  • 62