Regex, grab only one instance of each letter

Question

I have a paragraph that's broken up into an array, split at the periods. I'd like to perform a regex on index[i], replacing it's contents with one instance of each letter that index[i]'s string value has.

So; index[i]:"This is a sentence" would return --> index[i]:"thisaenc"

I read this thread. But i'm not sure if that's what i'm looking for.

What regex flaver are you using? Javascript doesn't support lookbehinds. — agent-j, Jun 14 '11 at 15:21
strictly speaking, a substitution is more than just a regexp. But you should specify your language, the quirks and varieties of regexp engines are legion. Though you'll need one with variable-length negative lookback, I'm almost sure, or find a workaround. — LHMathies, Jun 14 '11 at 15:29
Does the order of the output characters matter? You can do my solution below, except with a slightly different negative lookahead (which jscript supports), but characters will be added as if you were reading right-to-left. — agent-j, Jun 14 '11 at 15:29
Do you just want all the unique letters that occur in the sentence? — jcolebrand, Jun 14 '11 at 15:32
Empty while loop works in perl: `while ($str =~ s/(\w)(.*)(\1)(.*)/\1\2\4/i) {}` someone just has to strip the spaces and lower the cases somehow in javascript, and rewrite that pattern in javascript and it'll work. I have no idea how to do that :p Oh, none greedy matching not necessary, I'll edit that out. Just made more sense in my head to be none greedy. — NorthGuard, Jun 14 '11 at 15:41
So apparently javascript returns the new string instead of a boolean, um... you can still do it if you just store the string and then have the while condition be if the new replaced string != the old stored string then keep replacing. That's probably more involved than you want though. — NorthGuard, Jun 14 '11 at 15:51

Spudley · Answer 1 · 2011-06-14T16:25:41.483

2

Not sure how to do this in regex, but here's a very simple function to do it without using regex:

function charsInString(input) {
    var output='';
    for(var pos=0; pos<input.length; pos++) {
        char=input.charAt(pos).toLowerCase();
        if(output.indexOf(char) == -1 && char != ' ') {output+=char;}
    }
    return output;
}

alert(charsInString('This is a sentence'));

edited Jun 14 '11 at 16:25

answered Jun 14 '11 at 15:51

Spudley

166,037
39
233
307

If you change the two lines like this: `char=input.charAt(pos).toLowerCase(); if(output.indexOf(char) == -1 && char != ' ') {output+=char;}` it would give the exact desired string. – morja Jun 14 '11 at 16:23
@morja -- yes, thanks; I'll edit the answer. (I posted it without testing it or checking it because just as I was finishing I saw @Jason had posted his own answer!) – Spudley Jun 14 '11 at 16:25

Rob Raisch · Accepted Answer · 2011-06-14T16:30:09.423

As I'm pretty sure what you need cannot be achieved using a single regular expression, I offer a more general solution:



// collapseSentences(ary) will collapse each sentence in ary 
// into a string containing its constituent chars
// @param  {Array}  the array of strings to collapse
// @return {Array}  the collapsed sentences
function collapseSentences(ary){
  var result=[];
  ary.forEach(function(line){
    var tmp={};
    line.toLowerCase().split('').forEach(function(c){
        if(c >= 'a' && c <= 'z') {
            tmp[c]++;
        }
    });
    result.push(Object.keys(tmp).join(''));
  });
  return result;
}

which should do what you want except that the order of characters in each sentence cannot be guaranteed to be preserved, though in most cases it is.

Given:

var index=['This is a sentence','This is a test','this is another test'],
    result=collapseSentences(index);

result contains:

["thisaenc","thisae", "thisanoer"]

agent-j · Answer 3 · 2011-06-14T15:31:32.837

0

(\w)(?<!.*?\1)

This yields a match for each of the right characters, but as if you were reading right-to-left instead. This finds a word character, then looks ahead for the character just matched.

edited Jun 14 '11 at 15:31

answered Jun 14 '11 at 15:24

agent-j

27,335
5
52
79

If as previously stated JavaScript doesn't support lookbehind, how would this work in the OP's stated environment? – Rob Raisch Jun 14 '11 at 15:30
What regex flavor does that work under? Lookbehind typically needs to be fixed width, and in your case isn't looking behind anything or actually causing the string to be substituted. – Seth Robertson Jun 14 '11 at 15:31
I answered the question before he answered my request for environment. Good catch, though. :-) – agent-j Jun 14 '11 at 15:32
I suppose you could use a positive lookahead, then replace the matches with empty string. – agent-j Jun 14 '11 at 15:34

score 0 · Answer 4 · answered Jun 14 '11 at 15:49

0

Nevermind, i managed:

justC = "";
if (color[i+1].match(/A/g)) {justC += " L_A";}
if (color[i+1].match(/B/g)) {justC += " L_B";}
if (color[i+1].match(/C/g)) {justC += " L_C";}
if (color[i+1].match(/D/g)) {justC += " L_D";}
if (color[i+1].match(/E/g)) {justC += " L_E";}
else {color[i+1] = "L_F";}

It's not exactly what my question may have lead to belive is what i wanted, but the printout for this is what i was after, for use in a class: <span class="L_A L_C L_E"></span>

answered Jun 14 '11 at 15:49

C_K

1,243
4
18
29

1

glad you found a solution that worked for you. I've got absolutely no clue how this relates to the question though! – Spudley Jun 14 '11 at 15:52
Err... match(/A/g) will return the number of times 'A' appears in the target while match(/A/) will return 1 (true) if at least one 'A' occurs. Lose the /g to significantly improve performance. – Rob Raisch Jun 14 '11 at 16:27

score 0 · Answer 5 · answered Jun 14 '11 at 16:03

0

How about:

var re = /(.)((.*?)\1)/g;
var str = 'This is a sentence';
x = str.toLowerCase();
x = x.replace(/ /g, '');
while(x.match(re)) {
    x=x.replace(re, '$1$3');
}

answered Jun 14 '11 at 16:03

Toto

89,455
62
89
125

score -1 · Answer 6 · answered Jun 14 '11 at 15:42

-1

I don't think this can be done in one fell regex swoop. You are going to need to use a loop.

While my example was not written in your language of choice, it doesn't seem to use any regex features not present in javascript.

perl -e '$foo="This is a sentence"; while ($foo =~ s/((.).*?)\2/$1/ig)  { print "<$1><$2><$foo>\n"; } print "$foo\n";'

Producing:

This aenc

answered Jun 14 '11 at 15:42

Seth Robertson

30,608
7
64
57

Why exactly was the downvoted? It works and uses regular expression. – Seth Robertson Jun 14 '11 at 19:15

Regex, grab only one instance of each letter

6 Answers6