2

I have a paragraph that's broken up into an array, split at the periods. I'd like to perform a regex on index[i], replacing it's contents with one instance of each letter that index[i]'s string value has.

So; index[i]:"This is a sentence" would return --> index[i]:"thisaenc"

I read this thread. But i'm not sure if that's what i'm looking for.

Community
  • 1
  • 1
C_K
  • 1,243
  • 4
  • 18
  • 29
  • What regex flaver are you using? Javascript doesn't support lookbehinds. – agent-j Jun 14 '11 at 15:21
  • @agent-j - Jquery, added the tag. – C_K Jun 14 '11 at 15:24
  • @Jason, what's this got to do with jQuery? – nickf Jun 14 '11 at 15:26
  • strictly speaking, a substitution is more than just a regexp. But you should specify your language, the quirks and varieties of regexp engines are legion. Though you'll need one with variable-length negative lookback, I'm almost sure, or find a workaround. – LHMathies Jun 14 '11 at 15:29
  • Does the order of the output characters matter? You can do my solution below, except with a slightly different negative lookahead (which jscript supports), but characters will be added as if you were reading right-to-left. – agent-j Jun 14 '11 at 15:29
  • 1
    Do you just want all the unique letters that occur in the sentence? – jcolebrand Jun 14 '11 at 15:32
  • Empty while loop works in perl: `while ($str =~ s/(\w)(.*)(\1)(.*)/\1\2\4/i) {}` someone just has to strip the spaces and lower the cases somehow in javascript, and rewrite that pattern in javascript and it'll work. I have no idea how to do that :p Oh, none greedy matching not necessary, I'll edit that out. Just made more sense in my head to be none greedy. – NorthGuard Jun 14 '11 at 15:41
  • So apparently javascript returns the new string instead of a boolean, um... you can still do it if you just store the string and then have the while condition be if the new replaced string != the old stored string then keep replacing. That's probably more involved than you want though. – NorthGuard Jun 14 '11 at 15:51

6 Answers6

2

Not sure how to do this in regex, but here's a very simple function to do it without using regex:

function charsInString(input) {
    var output='';
    for(var pos=0; pos<input.length; pos++) {
        char=input.charAt(pos).toLowerCase();
        if(output.indexOf(char) == -1 && char != ' ') {output+=char;}
    }
    return output;
}

alert(charsInString('This is a sentence'));
Spudley
  • 166,037
  • 39
  • 233
  • 307
  • If you change the two lines like this: `char=input.charAt(pos).toLowerCase(); if(output.indexOf(char) == -1 && char != ' ') {output+=char;}` it would give the exact desired string. – morja Jun 14 '11 at 16:23
  • @morja -- yes, thanks; I'll edit the answer. (I posted it without testing it or checking it because just as I was finishing I saw @Jason had posted his own answer!) – Spudley Jun 14 '11 at 16:25
1

As I'm pretty sure what you need cannot be achieved using a single regular expression, I offer a more general solution:



// collapseSentences(ary) will collapse each sentence in ary 
// into a string containing its constituent chars
// @param  {Array}  the array of strings to collapse
// @return {Array}  the collapsed sentences
function collapseSentences(ary){
  var result=[];
  ary.forEach(function(line){
    var tmp={};
    line.toLowerCase().split('').forEach(function(c){
        if(c >= 'a' && c <= 'z') {
            tmp[c]++;
        }
    });
    result.push(Object.keys(tmp).join(''));
  });
  return result;
}

which should do what you want except that the order of characters in each sentence cannot be guaranteed to be preserved, though in most cases it is.

Given:

var index=['This is a sentence','This is a test','this is another test'],
    result=collapseSentences(index);

result contains:

["thisaenc","thisae", "thisanoer"]
Rob Raisch
  • 17,040
  • 4
  • 48
  • 58
0
(\w)(?<!.*?\1)

This yields a match for each of the right characters, but as if you were reading right-to-left instead. This finds a word character, then looks ahead for the character just matched.

agent-j
  • 27,335
  • 5
  • 52
  • 79
  • If as previously stated JavaScript doesn't support lookbehind, how would this work in the OP's stated environment? – Rob Raisch Jun 14 '11 at 15:30
  • What regex flavor does that work under? Lookbehind typically needs to be fixed width, and in your case isn't looking behind anything or actually causing the string to be substituted. – Seth Robertson Jun 14 '11 at 15:31
  • I answered the question before he answered my request for environment. Good catch, though. :-) – agent-j Jun 14 '11 at 15:32
  • I suppose you could use a positive lookahead, then replace the matches with empty string. – agent-j Jun 14 '11 at 15:34
0

Nevermind, i managed:

justC = "";
if (color[i+1].match(/A/g)) {justC += " L_A";}
if (color[i+1].match(/B/g)) {justC += " L_B";}
if (color[i+1].match(/C/g)) {justC += " L_C";}
if (color[i+1].match(/D/g)) {justC += " L_D";}
if (color[i+1].match(/E/g)) {justC += " L_E";}
else {color[i+1] = "L_F";}

It's not exactly what my question may have lead to belive is what i wanted, but the printout for this is what i was after, for use in a class: <span class="L_A L_C L_E"></span>

C_K
  • 1,243
  • 4
  • 18
  • 29
  • 1
    glad you found a solution that worked for you. I've got absolutely no clue how this relates to the question though! – Spudley Jun 14 '11 at 15:52
  • Err... match(/A/g) will return the number of times 'A' appears in the target while match(/A/) will return 1 (true) if at least one 'A' occurs. Lose the /g to significantly improve performance. – Rob Raisch Jun 14 '11 at 16:27
0

How about:

var re = /(.)((.*?)\1)/g;
var str = 'This is a sentence';
x = str.toLowerCase();
x = x.replace(/ /g, '');
while(x.match(re)) {
    x=x.replace(re, '$1$3');
}
Toto
  • 89,455
  • 62
  • 89
  • 125
-1

I don't think this can be done in one fell regex swoop. You are going to need to use a loop.

While my example was not written in your language of choice, it doesn't seem to use any regex features not present in javascript.

perl -e '$foo="This is a sentence"; while ($foo =~ s/((.).*?)\2/$1/ig)  { print "<$1><$2><$foo>\n"; } print "$foo\n";'

Producing:

This aenc
Seth Robertson
  • 30,608
  • 7
  • 64
  • 57