1

I have the following code to replace a string of DNA with its complement where A <=> T and C <=> G. I do this with very basic knowledge of regular expressions. How can I refactor the following using regular expressions to capture the letter and replace it with its complement.

function DNA(strand) {
    return strand.replace(/A|T|C|G/g, x => {
        return (x=="A") ? "T" : (x=="T") ? "A" (x=="C") ? "G" : "C";
    });
}
Nate
  • 6,384
  • 3
  • 25
  • 30
  • Looking at that code I'd say regexs are the least of your worries. – Jared Smith Oct 22 '15 at 16:49
  • Could you provide some examples of what you are trying to achieve? – Wiktor Stribiżew Oct 22 '15 at 16:49
  • @stibizhev takes in a string like "ATTC" and returns "TAAG" – Nate Oct 22 '15 at 16:51
  • @JaredSmith That's incredibly helpful. You can feel free to elaborate but I'm using a simple example to help myself understand regular expressions better. – Nate Oct 22 '15 at 16:52
  • I don't understand why do you need a regex? It can be done by using conditional operator. If `A`, returning `T`. If `C`, returning `G`.... –  Oct 22 '15 at 16:54
  • 1
    @Nate was in the process of elaborating, just took a min. Wasn't necessarily trying to be rude. Check my answer below. – Jared Smith Oct 22 '15 at 17:00
  • @Nate since you're trying to learn regexs, you might like [regex golf](https://regex.alf.nu/) – Aaron Oct 22 '15 at 17:36

4 Answers4

3

This is rather inelegant, IMO, but it is a one(two?) step replacement algorithm that uses javascript regex capabilities - if you're interested, I can explain what the heck it's doing

function DNA(strand) {
    return strand
        .concat("||TACG")
        .replace(/A(?=.*?(T))|T(?=.*?(A))|C(?=.*?(G))|G(?=.*?(C))|\|\|....$/gi, "$1$2$3$4");
}

See this fiddle (now updated a bit for testability) to play around with it.

This might seem like a simple example for which to build a regex, but it's not really (if you want it to all be in the regex, that is). It would be far more efficient to use a simple mapping table (hashtable), split the characters, remap/translate them, and join them together (as @Jared Smith did), since the regex engine is not very efficient. If this is solely for personal interest and learning regex, then please feel free to ask for any required explanation.

Edit for jwco:

As I stated, this is rather inelegant (or at least inefficient) for a production level solution, but perhaps rather elegant as an art piece(?). It uses only JavaScript regex(Regexp) capabilities, so no "regular expression conditions" or "look-behind", and if JavaScript supported "free-spacing", you could actually use the regex as shown below.

This is a relatively common way of breaking down components of a regex to explain what each part is matching, looking for and capturing:

  A         #  Match an A, literally
  (?=       #  Look ahead, and
    .*?     #    Match any number of any character lazily (as necessary)
    (T)     #    Match and capture a T, literally (into group #1)
  )         #  End look-ahead
|           #-OR-
  T         #  Match a T, literally
  (?=       #  Look ahead, and
    .*?     #    Match any number of any character lazily (as necessary)
    (A)     #    Match and capture an A, literally (into group #2)
  )         #  End look-ahead
|           #-OR-
  C         #  Match a C, literally
  (?=       #  Look ahead, and
    .*?     #    Match any number of any character lazily (as necessary)
    (G)     #    Match and capture a G, literally (into group #3)
  )         #  End look-ahead
|           #-OR-
  G         #  Match a G, literally
  (?=       #  Look ahead, and
    .*?     #    Match any number of any character lazily (as necessary)
    (C)     #    Match and capture a C, literally (into group #4)
  )         #  End look-ahead
|           #-OR-
 \|\|....$  #  match two literal pipes (|), followed by four of any character and the end of the string

Anything matched by this expression (which should be every part of the entire string) will be replaced by the replacement expression $1$2$3$4. The "global" flag (the g in the /gi) will make it keep trying to match as long as there is more of the string to test.

The expression is made up of five possible options (one for each possible letter switch and then a "cleanup" match). The first four options are identical except for the particular letters matched. Each of these matches and consumes a particular desired letter, then "looks ahead" in the string to find its "translation" or "complement", captures it without consuming anything else, then completes as a successful alternative, thus satisfying the expression as a whole.

Since only one of the matching groups (1-4) could have matched for any successful tested letter, only one of the backreferences ($1, etc in $1$2$3$4) could possibly contain a captured value. In the case of the fifth option (\|\|....$), there is no capture, so none of the capture groups contain a value with which to replace the match.

Before being fed into the regex engine, the string ||TACG is appended to the source, kind of like a telomere... ... sorta... -- this provides a replacement source, if the source string does not contain the appropriate "complement" letter in an earlier position (or at all?!). The last option in the regex effectively removes this extraneous information, by matching it and replacing it with nothing.

This could be done for any set of replacements, but gets less and less efficient as more changes are appended. Maintainability for such a regex would also, as indicated by a certain commenter's (I hope jovial) threat, ummm.... it would be a challenge. Enjoy!

Code Jockey
  • 6,611
  • 6
  • 33
  • 45
  • That's awesome. If I were the one who had to maintain that code I'd kill you, but its still awesome. Plus to the one. – Jared Smith Oct 23 '15 at 15:31
  • @JaredSmith -- aww shucks - tweren't nuthin' but a little brain thinkin' -- it wouldn't be too terribly nasty if one were able to use free spacing... but since this is JavaScript... not so much. It's short, at least! I'll just make sure you don't know where to find me – Code Jockey Oct 23 '15 at 16:44
  • @CodeJockey Your code above answers a slightly different or more specific question I had. I decided to post the question separately anyway: http://stackoverflow.com/q/33376354/895065. Would love to get more of an explanation from you on what your code above is doing and how you arrived at the regular expression. – Jesse W. Collins Oct 27 '15 at 19:01
  • @jwco - was just answering your question, then it was closed(?) – Code Jockey Oct 27 '15 at 19:22
  • @CodeJockey Marked as Duplicate to the question here. However, this question isn't specifically asking for a replacement string as you provided, or at least I didn't gather that from it and it's example. Maybe this question should be edited to reflect that? Not sure what the SO solution to this is, there are a lot of these DNA complement questions and answers floating around. I am excited about the Regex learning opportunity, didn't think it was doable. – Jesse W. Collins Oct 27 '15 at 19:29
  • @jwco - it's kind of a stretch in many cases, but this is fairly straightforward to explain. At your request, I'm expanding the explanation. – Code Jockey Oct 27 '15 at 19:49
  • @CodeJockey Thanks. Btw, maybe there is another more general way to phrase this type of RegEx problem, e.g. in terms of "permutation cycles"? – Jesse W. Collins Oct 27 '15 at 20:00
  • @jwco -- re: SO solution -- editing a question to fix grammar and formatting is fine, but changing what it's asking is generally frowned upon, which I think would be required for your situation. the solution in that case (you have a different question) is to ask another question! :-D since the close fairies have determined your question is too similar, I went ahead and added to my answer here, despite it being only a small part of the question to begin with. – Code Jockey Oct 27 '15 at 20:03
  • @CodeJockey Wow, glad I asked you for additional explanation, I really enjoyed reading it! Love the artistic "telomere" usage too. I should probably let you go and find this on my own, but is javascript's str.replace constructing the replacement string as the Regex Engine performs each match, or waiting until the end, or is the Regex Engine itself constructing the replacement string given str.replace's input? – Jesse W. Collins Oct 27 '15 at 22:54
2

First, never nest ternary operators. Try this instead:

DNAmapping = {
    'G': 'C',
    'C': 'G',
    'A': 'T',
    'T': 'A'
};

function reverseDNA(strand) {
    return strand
        .split('')                       //convert to arrray of chars
        .filter(s => s.match(/A|T|C|G/)  //filter bases
        .map(x => DNAmapping[x])         //sub other char
        .join('');                       //turn back into a string
}

This now uses your regex to return only the characters that appear in DNA base pairs. Suppose you have other stuff in that stand that you want to keep:

var reverseDNA = (strand => strand.replace(/A|T|C|G/g, x => DNAmapping[x]));

now its closer to your original, and a readable one-liner to boot.

Jared Smith
  • 19,721
  • 5
  • 45
  • 83
  • Thanks Jared. This is actually more helpful. However, I guess I need to make clearer in my question that I'm trying to work with regular expressions because I'm trying to learn them better. The simple algorithm is just an example, not something I'm trying to resolve with other best practices. – Nate Oct 22 '15 at 16:59
  • Part of learning the tool is knowing when to use it. Nevertheless, I'll update. – Jared Smith Oct 22 '15 at 17:04
  • @Nate - this is not a problem that is very well suited to use of regular expressions... That said, there is a replacement expression, however inelegant, that will switch your characters (see my answer) – Code Jockey Oct 22 '15 at 18:47
0

It seems like Jared Smith's answer isn't what you were looking for, and that you want something closer to what you suggested with the replace function. How about this :

function DNA(strand) {
    var s = strand.replace(/A|T|/g, function(c) {
        return (c=="A") ? "T" : "A";
    });
    s = s.replace(/G|C|/g, function(c) {
        return (c=="G") ? "C" : "G";
    });
    return s;
}

What seems like the only problem with your suggestion is the nested ternary conditions.

Edit : here's a slightly less elegant version with a switch, but at least it holds in one function call :

strand.replace(/A|T|G|C/g,function(c){
    switch(c) {
        case "A":
        return "T";
        case "T":
        return "A";
        case "C":
        return "G";
        case "G":
        return "C";
        default:
        return "";
    }
})
John Pink
  • 607
  • 5
  • 14
0

You may want to use a function as your new value instead of an object. You could do something like;

function DNA(strand) {
    return strand.replace(/A|T|C|G/g, function(x) {
        return (x=="A") ? "T" : (x=="T") ? "A" : (x=="C") ? "G" : "C";
    });
}
WillyMilimo
  • 447
  • 3
  • 12