2

I have this code:

String path; 
path = main.getInput(); // lets say getInput() is "Hello \Wo rld\"
args = path.split("\\s+");

for (int i = 0; i < args.length; i++) {
     System.out.println(args[i]);
}

Is there a way to split the string so that the words are split and put into an array, but only if they are not in between two backslashes, so that "Wo rld" will be one word and not two?

H3XXX
  • 595
  • 1
  • 9
  • 22
  • My strategy would be to first split by backslashes, then split the pieces in even indices (0, 2, 4...) by space, and then collect the results in a single array. – Ghostkeeper Mar 23 '14 at 13:44
  • will there be multiple spaces in between the backslashes, like `\Wo r l d\` ? – donfuxx Mar 23 '14 at 13:46
  • @donfuxx Yes. Like you can see, there is a space between 'Wo' and 'rld'. – H3XXX Mar 23 '14 at 13:47
  • I meant: Can there be more than one space, like `W` `or` `ld` – donfuxx Mar 23 '14 at 13:51
  • Oh, sorry. Yes, there may be more than one space. – H3XXX Mar 23 '14 at 13:53
  • 1
    possible duplicate of [Regex for splitting a string using space when not surrounded by single or double quotes](http://stackoverflow.com/questions/366202/regex-for-splitting-a-string-using-space-when-not-surrounded-by-single-or-double) – Denis Tulskiy Mar 23 '14 at 13:58
  • Does it have to be `split` or even regex? Writing your own parser wouldn't be so hard and it would iterate over your string only once. – Pshemo Mar 23 '14 at 14:07
  • Split isin't going to work for this... its too limited –  Mar 23 '14 at 14:10
  • Where is the split if the text is this `Hello \Wo rld\ wide\` –  Mar 23 '14 at 14:28

3 Answers3

4

You could try splitting only on spaces that are followed by an even number of backslashes. Raw regex:

\s+(?=(?:[^\\]*\\[^\\]*\\)*[^\\]*$)

Java escaped regex:

\\s+(?=(?:[^\\\\]*\\\\[^\\\\]*\\\\)*[^\\\\]*$)

ideone demo

Jerry
  • 70,495
  • 13
  • 100
  • 144
  • Your regex breaks with this input `Hello \\Wo rld\\ our world \\test now` java escaped `\\ ` used. – Sabuj Hassan Mar 23 '14 at 14:11
  • That works, except the problem is that I am taking my string directly from a JTextField, which means my string is always going to have one slash so \\Wo rld\\ for me is actually \Wo rld\ – H3XXX Mar 23 '14 at 14:12
  • @AdamMiszczak I'm not sure I see the problem :s – Jerry Mar 23 '14 at 14:19
  • @Jerry Say I have a string variable, I can't have it "\Wo rld\", I need to escape the backslashes, so it becomes "\\Wo rld\\". But, because I am setting my string variable equal to textField.getText(), the string is actually "\Wo rld\", apologies if this sounds unclear. Because of this, the method above doesn't work for me, the output is still three different words. – H3XXX Mar 23 '14 at 14:23
  • @AdamMiszczak But what gets printed when you do `System.out.println(textField.getText())`? You see "\Wo rld\" right? If so, the regex should be working fine. – Jerry Mar 23 '14 at 14:29
  • @Jerry I do, but the string is still being split into three. I'll try some different things, but with my current set up I am still getting three words. – H3XXX Mar 23 '14 at 14:32
  • @AdamMiszczak Okay, I'm not sure what's going on :( It's the first time I hear about something like that... – Jerry Mar 23 '14 at 14:34
  • @Jerry I have tried some things, and it turns out that using certain strings breaks the code. For example, "Hello \Wo rld\" works, but something like "Hello\Wo rld\Hello" or a file path like "C:\Users\Adam Miszczak\Desktop" doesn't work. "Hello\Wo rld\Hello" isn't split at all and "C:\Users\Adam Miszczak\Desktop" is split into three words. I'll do some research to see how to fix it. Thanks for helping so far. – H3XXX Mar 23 '14 at 14:46
  • @AdamMiszczak So you're not first splitting on space? Should `Hello\Wo rld\Hello` become `Hello` `Wo rld` and `Hello` and the path become `C:` `Users` `Adam Miszczak` and `Desktop`? So you're splitting on backslash too? – Jerry Mar 23 '14 at 14:59
  • @Jerry No, I am. But the code above only works for a string of that specific style. If I want a slightly different variation of it, such as the examples I gave you above, then the code does not work. – H3XXX Mar 23 '14 at 15:25
  • (continued because I forgot something): So, "Hello \Wo rld\Hello" should be "Hello \World\Hello" and the file path should be " C:\Users\Adam Miszczak\Desktop" – H3XXX Mar 23 '14 at 15:31
  • @AdamMiszczak Okay, just to make things clear, if you have an input of `"Hello \Wo rld\Hello C:\Users\Adam Miszczak\Desktop"`, it should be broken down into 3 parts: 1. `Hello`, 2. `\Wo rld\Hello` and 3. `C:\Users\Adam Miszczak\Desktop`? – Jerry Mar 23 '14 at 15:43
  • @AdamMiszczak Okay, that makes it quite complicated then. The path can be anything right? If so, I think it might be impossible. Consider the string: `\Hello World C:\users` One might say that `Hello World C:` is not a directory because `C:` is a drive. But then, if we had: \Hello\World \Hello\World` Are there two paths or one path? Directories can end in spaces too =/ Or will all your paths begin with `A:` (where A is any letter) – Jerry Mar 23 '14 at 15:58
  • It's like having `"Hello C:\Users\Adam Miszczak\Desktop \Wo rld\Hello"` actually. – Jerry Mar 23 '14 at 16:01
  • Ah well ok. I'll just try some different things and see what I can do. Thanks for helping though. I've accepted your answer. – H3XXX Mar 23 '14 at 16:02
  • @AdamMiszczak Thanks. I don't know your edge cases (the one I put in the above comment doesn't work with the code that follows), but if you use match instead of split, you could perhaps use something like [this](http://ideone.com/AFSJ3E)? I guess you'll get some of the job done until you can find something better. – Jerry Mar 23 '14 at 16:05
1

Try this one:

String s = "John Hello \\Wo rld\\ our world";
Pattern pattern = Pattern.compile("(\\\\.*?\\\\)|(\\S+)");
Matcher m = pattern.matcher(s);
while (m.find( )) {
    if(m.group(1) != null){
        System.out.println(m.group(1));
    }
    else{
        System.out.println(m.group(2));
    }
}

Output:

John
Hello
\Wo rld\
our
world
Sabuj Hassan
  • 38,281
  • 14
  • 75
  • 85
0

If it doesn't have to be regex then you can use this simple parser and get your result in one iteration.

public static List<String> spaceSplit(String str) {
    List<String> tokens = new ArrayList<>();

    StringBuilder sb = new StringBuilder();
    boolean insideEscaped = false; //flag to check if I can split on space 

    for (char ch : str.toCharArray()) {

        if (ch == '\\') 
            insideEscaped = !insideEscaped;

        // we need to split only on spaces which are not in "escaped" area
        if (ch == ' ' && !insideEscaped) {
            if (sb.length() > 0) {
                tokens.add(sb.toString());
                sb.delete(0, sb.length());
            }
        } else //and add characters that are not spaces from between \
            sb.append(ch);
    }
    if (sb.length() > 0)
        tokens.add(sb.toString());

    return tokens;
}

Usage:

for (String s : spaceSplit("hello \\wo rld\\"))
    System.out.println(s);

Output:

hello
\wo rld\
Pshemo
  • 122,468
  • 25
  • 185
  • 269