5

I have a need to replace the word 'OR' with '||' in a given string. It should be replaced only when it is a complete word by itself in the input string. Also, it shouldn't be replaced if it is appearing within quotes. For e.g., if the input string is

application.path="EXCEL.exe" OR application.path="EXCELSIOR.exe" OR application.path="XYZ OR ABC.exe"

the output should be

application.path="EXCEL.exe" || application.path="EXCELSIOR.exe" || application.path="XYZ OR ABC.exe"

Note that the OR in EXCELSIOR.exe and "XYZ OR ABC.exe" is not replaced.

The Java code I'm using is as follows:

String inputStr = "(quote.AGE was 24 AND (application.path = \"**\\acad.exe\" OR application.path = \"**\\dxfdwg.exe\" OR application.path = \"**\\EXCELSIOR.EXE\" OR application.path = \"**\\iges.exe\" OR application.path = \"**\\notepad.exe\" OR application.path = \"**\\run_journal.exe\" OR application.path = \"**\\AcroRd32.exe\" OR application.path = \"**\\dllhost.exe\" OR application.path = \"**\\powerpnt.exe\" OR application.path = \"**\\Edge.exe\" OR application.path = \"**\\step203ug.exe\" OR application.path = \"**\\step214ug.exe\" OR application.path = \"**\\VisView.exe\" OR application.path = \"**\\Teamcenter.exe\" OR application.path = \"**\\ug_convert_part.exe\" OR application.path = \"**\\ugraf.exe\" OR application.path = \"**\\ugtopv.exe\" OR application.path = \"**\\wmplayer.exe\" OR application.path = \"**\\winword.exe\" OR application.path = \"**\\wordpad.exe\" OR application.path = \"**\\vlc.exe\" OR application.path = \"**\\dwgviewr.exe\" OR application.name = \"RMS\" OR application.path = \"**\\acrobat.exe\" OR application.path = \"**\\Alias.exe\" OR application.path = \"**\\awtessd.exe\" OR application.path = \"**\\proe.exe\" OR application.path = \"**\\STPViewer.exe\" OR application.path = \"**\\gom_inspect.exe\" OR application.path = \"**\\gom_cad_server2.exe\" OR application.path = \"**\\sldworks.exe\" OR application.path = \"**\\sldworks_fs.exe\" OR application.path = \"**\\sldProcMon.exe\" OR application.path = \"**\\AdapplicationMgr.exe\" OR application.path = \"**\\AdapplicationMgrSvc.exe\" OR application.path = \"**\\SE3Dtrans.exe\" OR application.path = \"**\\stamp.exe\" OR application.path = \"**\\psolid.exe\" OR application.path = \"**\\mpid.exe\" OR application.path = \"**\\mpirun.exe\" OR application.path = \"**\\FS.exe\" OR application.path = \"**\\xtop.exe\" OR application.path = \"**\\pro_comm_msg.exe\" OR application.path = \"**\\nmsd.exe\" OR application.path = \"**\\creoagent.exe\" OR application.path = \"**\\parametric.exe\" OR application.path = \"**\\PDFEditor.exe\" OR application.path = \"**\\CNEXT.exe\" OR application.path = \"**\\drafter.exe\" OR application.path = \"**\\convert.exe\" OR application.path = \"**\\ActCut3D.exe\" OR application.path = \"**\\ppcbasic.exe\" OR application.path = \"**\\deltamesh_stamping.exe\" OR application.path = \"Xasfsf\" OR application.path = \"sfdsdf\"))";
String replacedStr = inputStr.replaceAll("(?m)\\bOR\\b(?=(?:\"[^\"]*\"|[^\"])*$)", "||");

This works fine for shorter strings, but once the length goes beyond 2000 characters, it throws the following error:

Exception in thread "main" java.lang.StackOverflowError at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3796) at java.util.regex.Pattern$Branch.match(Pattern.java:4604) at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658) at java.util.regex.Pattern$Loop.match(Pattern.java:4785) at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717) at java.util.regex.Pattern$BranchConn.match(Pattern.java:4568) at java.util.regex.Pattern$CharProperty.match(Pattern.java:3777) at java.util.regex.Pattern$Branch.match(Pattern.java:4604)

I read in some other threads(thread1, thread2) that Java doesn't handle regex for long strings very well. Can someone suggest how I can improve my regex to avoid the StackOverflowError?

Santy
  • 63
  • 6

1 Answers1

1

Can someone suggest how I can improve my regex to avoid the StackOverflowError?

Yes I can gives you two solutions, you just need to see your problem from another side.

Here is a quick analyse about your problem and a quick solution, you can use this regex instead (.*?\"\s+)\bOR\b(\s+application.*?) :

Solution one

String inputStr = //that long String
String regex = "(.*?\"\\s+)\\bOR\\b(\\s+application.*?)";
String replacedStr = inputStr.replaceAll(regex, "$1||$2");

System.out.println(replacedStr);

I notice that the OR you want to replace exist after " ans space OR the application, my regex will match that OR and replace it.

Output for the short example, it will gives you the same result for the long one :

application.path="EXCEL.exe" || application.path="EXCELSIOR.exe" || application.path="XYZ OR ABC.exe"
                             ^^                          ^^      ^^                       ^^

Solution two

If you are using Java 9+ you can use this regex application.path=(\"(.*?)\"), to match every thing like application.path="something here", the collect the result with ||

String regex = "application.path=(\"(.*?)\")";
String text = Pattern.compile(regex)
        .matcher(inputStr).results().map(MatchResult::group)
        .collect(Collectors.joining(" || "));
Youcef LAIDANI
  • 55,661
  • 15
  • 90
  • 140