0

I am trying to write some code to find for loops, replace their containing semicolons with @ signs, and add a new line after the closing parenthesis. The current algorithm I have is

pattern = "for(";
if (line.contains(pattern))
{
    openPos = line.indexOf(pattern) + "for".length();

    Occurence = 1;
    closePos = findClose(line, openPos, '(', ')');
    if (closePos != -1)
    {
        // Replace all line terminators within loop ()'s with @'s
        for (int lt = 0; lt < lineTerminator.size(); lt++)
        {
            tempLine = line.substring(openPos + "(".length(), closePos).replaceAll(";", "@");
        }
        line = line.substring(0, openPos + "(".length()) + tempLine + ")\n" + line.substring(closePos + 1, line.length()).trim();
        multiLine = "";
    }
}

This works great for single instances of a for loop on one line, but a new case that I ran into is when running this on a production JavaScript file, it doesn't work on any for loops after the first. I tried to encapsulate this in a while loop to continue on the same line while it can keep finding for loops as follows

indexOfPattern = line.indexOf(pattern);
while (indexOfPattern >= 0)
{
    openPos = indexOfPattern + pattern.length();
    Occurence = 1;
    closePos = findClose(line, openPos, '(', ')');
    if (closePos != -1)
    {
        // Replace all line terminators within additional loop ()'s with @'s
        for (int lt = 0; lt < lineTerminator.size(); lt++)
        {
            tempLine = line.substring(openPos + "(".length(), closePos).trim().replaceAll(lineTerminator.get(lt), "@");
        }
        line = line.substring(0, openPos + "(".length()) + tempLine + ")\n" + line.substring(closePos + 1, line.length()).trim();
    }
    indexOfPattern = line.indexOf(pattern, indexOfPattern + pattern.length());
}

but this is replacing semicolons outside of the for loop. Does anyone know of any slicker way to do this?

Edit: Here's some expected output

Input:

for(h=0;b[h];) for(i=0;i<10;i++) for(a in b) { do; some; things; }

Output:

for(h=0@b[h]@) for(i=0@i<10@i++) for(a in b) { do; some; things; }

Edit 2: I selected the regex answer since it seems to work for a lot of the cases except for this one (ridiculous javascript junk ahead):

for(b[this.id]=this,this.settings=new c.classes.configurable(c.settings,j.settings||{}),Object.defineProperty(this,"graph",{value:new c.classes.graph(this.settings),configurable:!0}),Object.defineProperty(this,"middlewares",{value:[],configurable:!0}),Object.defineProperty(this,"cameras",{value:{},configurable:!0}),Object.defineProperty(this,"renderers",{value:{},configurable:!0}),Object.defineProperty(this,"renderersPerCamera",{value:{},configurable:!0}),Object.defineProperty(this,"cameraFrames",{value:{},configurable:!0}),Object.defineProperty(this,"camera",{get:function(){return this.cameras[0]}}),Object.defineProperty(this,"events",{value:["click","rightClick","clickStage","doubleClickStage","rightClickStage","clickNode","clickNodes","doubleClickNode","doubleClickNodes","rightClickNode","rightClickNodes","overNode","overNodes","outNode","outNodes","downNode","downNodes","upNode","upNodes"],configurable:!0}),this._handler=function(a){var b,c={};for(b in a.data)c[b]=a.data[b];c.renderer=a.target,this.dispatchEvent(a.type,c)}.bind(this),f=j.renderers||[],d=0,e=f.length;e>d;d++)

Notice the nested for(b in a.data) towards the end - this is what's giving the regular expression answer problems. Anybody got a catch-all to handle this silly case?

mjswartz
  • 715
  • 1
  • 6
  • 19
  • 1
    ?Java? ?Javascript? Call me confused. – Hovercraft Full Of Eels May 27 '16 at 16:23
  • The code is written in Java, the file on which the code runs is in JavaScript – mjswartz May 27 '16 at 16:26
  • Can you show the input and expected output? – IMTheNachoMan May 27 '16 at 16:29
  • You should write a proper parser. If this is parsing real Javascript, then there could be additional parentheses within the for loop pair (function calls, math expressions etc.), there could be additional semicolons (within strings, for example), and there could be anything at all within comments. – RealSkeptic May 27 '16 at 16:32
  • Barmar, feel my pain! I'm looking at a NESTED for loop right now. RealSkeptic, by this point comments have been removed and my `findClose()` function handles additional parentheses. – mjswartz May 27 '16 at 16:57

2 Answers2

1

Here is a regex approach...

public String replaceForSemicolons(String input) {
    String pattern = "for\\s*\\([^;]+;[^;]+[^\\)]+\\)\\s*\\{";
    Pattern reg = Pattern.compile(pattern);
    Matcher matcher = reg.matcher(input);
    StringBuffer output = new StringBuffer();
    int previousEnd = 0;

    while(matcher.find()) {
        //get the matched 'for' without the opening bracket
        String matchedString = input.substring(matcher.start(), matcher.end()-1);
        //replace the semicolons with @
        matchedString = matchedString.replaceAll(";", "@");
        //append everything from the end of the last match to the start of this match
        output.append(input.substring(previousEnd, matcher.start()));
        //append the matched string with the replaced semicolons
        output.append(matchedString);
        //add a new line and the opening bracket that we left out from the matched string
        output.append("\n{");
        previousEnd = matcher.end();
    }

    //append the rest of the string
    output.append(input.substring(previousEnd));

    return output.toString();
}
gdros
  • 393
  • 2
  • 10
  • gdros, this is awesome. I made a few small changes, mainly removing the inclusion of { in your function. I can't assume that the code includes a { after each for, but other than that, this is great. (Also substring till matcher.end(), not matcher.end()-1 since that cuts off the closing ')') – mjswartz May 27 '16 at 17:11
  • This seems to work for the 99% of normal cases, but see my edit for a hard-mode example. – mjswartz May 27 '16 at 17:25
  • Your input is extreme. You will need lexical and syntax analysis to catch expressions like the one you provided. – gdros May 27 '16 at 19:07
0

It will be nearly, if not completely impossible to accomplish 100% of all cases unless you use some type of tokenization. For example if you had the following:

for (b[this.id] = this, this.settings = new c.classes.configurable(c.settings, j.settings || {}), Object.defineProperty(this, "graph", {

the regex will get stuck on the { in j.settings || {} instead of going all the way to d; d++)

this is the same reason you can not really parse HTML or XML with regex. Instead of doing a search/replace you really need to build a simple tokenizer for example take a look at the following psudo code:

var depth = 0
var ouput = ""
for each char in string {
  if char == '{' {
    depth += 1
  }
  if char == '}' {
    depth -= 1
  }
  if depth > 0 && char == ';'{
    output = output + "@"
  } else {
    output = output + char
  }
}

You will most likely have to add some additional states to the above tokenizer to accomplish everything you want to do ... but it should give you a good place to start.

Community
  • 1
  • 1
CaffeineAddiction
  • 803
  • 1
  • 14
  • 29
  • This is what my findClose() function does. I'll add in some arguments to replace characters and see how that works. – mjswartz May 27 '16 at 18:46
  • Yes, but you also have to deal with the `var b, c = {};` as well as the fact that you have a nested for loop `for (b in a.data) c[b] = a.data[b];` ... so even if you correctly find `d++)` as the close to the first for loop there could be some valid `;` in the content of the nested for-loop – CaffeineAddiction May 27 '16 at 18:53