0

I have strings like this:

ab
rx'
wq''
pok'''
oyu,
mi,,,,

Basically, I want to split the string into two parts. The first part should have the alphabetical characters intact, the second part should have the non-alphabetical characters. The alphabetical part is guaranteed to be 2-3 lowercase characters between a and z; the non-alphabetical part can be any length, and is gauranteed to only be the characters , or ', but not both in the one string (e.g. eex,', will never occur).

So the result should be:

[ab][]
[rx][']
[wq]['']
[pok][''']
[oyu][,]
[mi][,,,,]

How can I do this? I'm guessing a regular expression but I'm not particularly adept at coming up with them.

XåpplI'-I0llwlg'I -
  • 21,649
  • 28
  • 102
  • 151
  • You could try to find the indexOf the first character that is a , or a ' and then split the string in two parts having that index. – Nuxy Aug 10 '12 at 06:38

6 Answers6

2

If you can 100% guarantee that:

  1. Letter-strings are 2 or 3 characters
  2. There are always one or more primes/commas
  3. There is never any empty space before, after or in-between the letters and the marks
    (aside from line-break)

You can use:

/^([a-zA-Z]{2,3})('+|,+)$/gm

var arr = /^([a-zA-Z]{2,3})('+|,+)$/gm.exec("pok'''");
arr === ["pok'''", "pok", "'''"];

var arr = /^([a-zA-Z]{2,3})('+|,+)$/gm.exec("baf,,,");
arr === ["baf,,,", "baf", ",,,"];

Of course, save yourself some sanity, and save that RegEx as a var.

And as a warning, if you haven't dealt with RegEx like this: If a match isn't found -- if you try to match foo','' by mixing marks, or you have 0-1 or 4+ letters, or 0 marks... ...then instead of getting an array back, you'll get null.

So you can do this:

var reg = /^([a-zA-Z]{2,3})('+|,+)$/gm,
    string = "foobar'',,''",

    result_array = reg.exec(string) || [string];

In this case, the result of the exec is null; by putting the || (or) there, we can return an array that has the original string in it, as index-0.

Why?

Because the result of a successful exec will have 3 slots; [*string*, *letters*, *marks*]. You might be tempted to just read the letters like result_array[1]. But if the match failed and result_array === null, then JavaScript will scream at you for trying null[1].

So returning the array at the end of a failed exec will allow you to get result_array[1] === undefined (ie: there was no match to the pattern, so there are no letters in index-1), rather than a JS error.

Norguard
  • 26,167
  • 5
  • 41
  • 49
  • Primes/commas can be zero or more. – XåpplI'-I0llwlg'I - Aug 10 '12 at 07:04
  • 1
    Okay, so the answer to that is to change the `('+|,+)` to `('*|,*)`. It will then look for 0 or more marks, instead of one or more. – Norguard Aug 10 '12 at 07:07
  • 1
    The `g` means to check the whole line -- it's actually not needed for this one. It's useful if you're looking for, say, `"oo"` in a string, but it could be in multiple places. Like finding `/ow/g` in "How now, brown cow." - 4 matches. The `m` means if you've got a multi-line string, treat a line-break like the end of the string. So if you did one single line at a time, as a string (split the text at line-breaks (`"\n"`) ), then `m` does nothing. If you read the whole text in as 1 string, or left line-breaks in the string, somehow, then without the `m` this regex doesn't work. – Norguard Aug 10 '12 at 07:14
  • Thanks for the explanation. I've chosen another answer as the accepted solution because it is simpler, but yours is definitely the most comprehensive and safest. – XåpplI'-I0llwlg'I - Aug 10 '12 at 07:53
  • That is cool by me - just remember that the onus is on you, one way or another to make sure that your data is either validated on the way in, or on the way out. ie: if you are going to use word-boundaries, keep in mind that `_` is considered a letter, as far as `\w` is concerned -- so check for that stuff if your data is not 100% perfect. Also validate the length of the letter-string, after the boundary-split. In the end, you do similar amounts of work -- it is a question of where you do the work and how much you can trust what is in your data (*hint* -- public site: none of it) – Norguard Aug 10 '12 at 08:14
2

Regular expressions have is a nice special called "word boundary" (\b). You can use it, well, to detect the boundary of a word, which is a sequence of alpha-numerical characters.

So all you have to do is

foo.split(/\b/)

For example,

"pok'''".split(/\b/) // ["pok", "'''"]
user123444555621
  • 148,182
  • 27
  • 114
  • 126
  • Cool, didn't know about word boundaries. And just for anyone visiting this page, here is a good explanation of them: http://stackoverflow.com/a/4541595/963396 – XåpplI'-I0llwlg'I - Aug 10 '12 at 07:42
0

You could try something like that:

function splitString(string){
   var match1 = null;
   var match2 = null;
   var stringArray = new Array();
   match1 = string.indexOf(',');
   match2 = string.indexOf('`');
   if(match1 != 0){
      stringArray = [string.slice(0,match1-1),string.slice(match1,string.length-1];
   }
   else if(match2 != 0){
      stringArray = [string.slice(0,match2-1),string.slice(match2,string.length-1];
   }
   else{
      stringArray = [string];
   }

}

Nuxy
  • 386
  • 1
  • 2
  • 17
0
var str = "mi,,,,";
var idx = str.search(/\W/);
if(idx) {
    var list = [str.slice(0, idx), str.slice(idx)]
}

You'll have the parts in list[0] and list[1].

P.S. There might be some better ways than this.

nbaztec
  • 402
  • 4
  • 13
0

yourStr.match(/(\w{2,3})([,']*)/)

Nicholas Albion
  • 3,096
  • 7
  • 33
  • 56
  • 1
    Your regEx is going to allow for strings that contain `','` or `'',` or similar. – Norguard Aug 10 '12 at 06:52
  • Right. But the goal, ultimately should be to reject false-positives, rather than accept potentially-broken data. Technically, your solution works fine, so long as every line of data entered is 100% perfect. However, `a_b'` will pass in your regex, for example. Will it ever happen? Hopefully not. But if it were a mission-critical system (or involved anything that any user touched), I'd prefer the defensive white-list, rather than the inclusive black-list. – Norguard Aug 10 '12 at 07:41
0
if (match = string.match(/^([a-z]{2,3})(,+?$|'+?$)/)) {
    match = match.slice(1);
}
phaistonian
  • 94
  • 1
  • 4