2

So suppose I have this line:

print "Hello world!" out.txt

And I want to split it into:

print
"Hello world!"
out.txt

What would be the regular expression to match these?

Note that there must be a space between each of them. For example, if I had this:

print"Hello world!"out.txt

I would get:

print"Hello
world!"out.txt

The language I'm using is Haxe.

puggsoy
  • 1,270
  • 1
  • 11
  • 34
  • 1
    What are "these" ? the two first examples are contradictory. In the first one you replace each space by a line feed, except the one in the string, while in the second one, you split anytime there is a space, no matter if it is in a string or not. Am I missing something ? – Francis Toth Sep 07 '15 at 03:05
  • 2
    Regexes usually work differently across languages. – melpomene Sep 07 '15 at 03:15
  • Where is the grammar of the tokens? Haxe seems to be a language that can be compiled into other languages. What is your target language? It's foolhardy to write a regex without knowing what else you want to support with your command line-like format. – nhahtdh Sep 07 '15 at 03:50
  • @Francis: The first block is the line I want to parse, the second block is the bits of the string that the regex should match, each on a different line. So the regex should grab `print`, `"Hello world!"` and `out.txt` from the first block. Sorry for the confusion. – puggsoy Sep 07 '15 at 04:20
  • @melpomene: Whoops, looks like you're right; the two current answers both don't compile. I thought the syntax was usually the same between languages, since I've used non-Haxe examples before. – puggsoy Sep 07 '15 at 04:24
  • @nhahtdh: My target is Neko; to my knowledge Haxe compiles directly to bytecode for that. In any case this is purely supposed to separate lines into chunks (like arguments) that I will later on use separately. – puggsoy Sep 07 '15 at 04:26
  • @puggsoy: "Should grab something from some simple input" and a grammar is different. Let's say - how can `"` be specified in between quotes? Do you allow the last string to be an unclosed quoted string e.g. `"not closed`? You might want to take a look at a similar (not the same) question for an idea of what I'm talking about: http://stackoverflow.com/questions/14727065/ – nhahtdh Sep 07 '15 at 04:39

3 Answers3

2

You can use regular expressions in Haxe using the EReg api class:

Demo: http://try.haxe.org/#76Ea0

class Test {
    static function main() {
        var command = 'print "Hello world!" out.txt';
        var regexp:EReg = ~/\s(?![\w!.]+")/g;
        var result = regexp.replace(command, "\n");
        js.Browser.alert(result);
    }
}

About Haxe regular expressions:
http://haxe.org/manual/std-regex.html

About regular expressions replacement:
http://haxe.org/manual/std-regex-replace.html

EReg class API documentation:
http://api.haxe.org/EReg.html

Mark Knol
  • 9,663
  • 3
  • 29
  • 44
  • Despite this not being exactly what I asked for (matching the words), I can use EReg.split() to split it into each word, which is pretty much what I want. Thanks! – puggsoy Sep 12 '15 at 03:47
  • Hmm, unfortunately this only works if the string within the quotes has a single space. For example if I replace `"Hello world!"` with `"Hello to you world!"` it gets split into `"Hello`, `to`, and `you world!"`. – puggsoy Sep 21 '15 at 02:55
2

Expanding on Mark Knol's answer, this should work as expected for all your test strings so far:

static function main() {
    var command = 'print "Hello to you world!" out.txt';

    var regexp:EReg = ~/("[^"]+"|[^\s]+)/g;

    var result = [];
    var pos = 0;

    while (regexp.matchSub(command, pos)) {
        result.push(regexp.matched(0));
        var match = regexp.matchedPos();
        pos = match.pos + match.len;
    }

    trace(result);
}

Demo: http://try.haxe.org/#5c0B1

EDIT: As pointed out in comments, if your use case is to split different parts of a command line, then it should be better to have a parser handle it, and not a regex.

These libs might help:

azrafe7
  • 2,563
  • 1
  • 15
  • 6
  • This seems to work best, thanks! However you're probably right, I've been thinking manually parsing it is probably better. Those libraries don't seem to do exactly what I want; I'm not parsing command-line arguments exactly, rather a custom scripting syntax. I can implement my own one for that though, I just initially thought that a regex might be more efficient. All the same, thank you! – puggsoy Sep 27 '15 at 06:22
0

regex demo

\s(?![\w!.]+"\s)

an example worked for these two case,maybe someone have more better solution

Kerwin
  • 1,212
  • 1
  • 7
  • 14