1

I've this regex in my node.js script:

const commentPattern = new RegExp(
    '(\\/\\*([^*]|[\\r\\n]|(\\*+([^*/]|[\\r\\n])))*\\*+/)|(//.*)',
    'g'
);

which I use to extract comments from open source Java projects.

I've found out that some piece of commits stops my script. This is due to 'Catastrophic Backtracking' and I was looking for a way to catch it or prevent it in order to allow my code to keep running even after this cases.

Here is an example of code that blocks the execution of the script:

import android.content.res.Resources;
 import android.os.Handler;
 import android.preference.PreferenceFragment;
 import android.view.ViewGroup;
      * Provides the regex to identify domain HTTP(S) protocol and/or 'www' sub-domain.
      *
      * Used to format user-facing {@link String}'s in certain preferences.
      */
     public static final String ADDRESS_FORMAT_REGEX = "^(https?://(w{3})?|www\\.)";

     /**
     // Used to ensure that settings are only fetched once throughout the lifecycle of the fragment
     private boolean mShouldFetch;

     public View onCreateView(@NonNull LayoutInflater inflater,
                              ViewGroup container,
                              Bundle savedInstanceState) {
         // use a wrapper to apply the Calypso theme
         Context themer = new ContextThemeWrapper(getActivity(), R.style.Calypso_SiteSettingsTheme);
         LayoutInflater localInflater = inflater.cloneInContext(themer);
         View view = super.onCreateView(localInflater, container, savedInstanceState);

         if (view != null) {
             setupPreferenceList((ListView) view.findViewById(android.R.id.list), getResources());
         }

         return view;
     }

     @Override
     public void onChildViewAdded(View parent, View child) {
         if (child.getId() == android.R.id.title && child instanceof TextView) {
             // style preference category title views
             TextView title = (TextView) child;
             WPPrefUtils.layoutAsBody2(title);
         } else {
             // style preference title views
             TextView title = (TextView) child.findViewById(android.R.id.title);
             if (title != null) WPPrefUtils.layoutAsSubhead(title);
         }
     }

     @Override
     public void onChildViewRemoved(View parent, View child) {
         // NOP
     }

     @Override

I'm using Node.js version 8.6.0, I also tried on v9.8.0.

Riccardo
  • 47
  • 6
  • To match multiline comments, use `RegExp('/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/', 'g')` – Wiktor Stribiżew Mar 20 '18 at 17:20
  • Together with a single line comment, it will look like `RegExp('/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/|//.*', 'g')`. You should be aware that this regex might now work in 100% cases correctly (e.g. it can match inside string literals). – Wiktor Stribiżew Mar 20 '18 at 17:44
  • It does not work with this code unfortunately :( https://regex101.com/r/GaWSyh/1 as you can see the code is pretty strange from a comments point of view since there is a regex which contains //, a comment that starts but is never terminated (/**) and other regular comments – Riccardo Mar 20 '18 at 20:38
  • You did not use *my* regex pattern in the regex tester, [here is my pattern fiddle](https://regex101.com/r/GaWSyh/2). As I have already mentioned, the `//` and `/*...*/` will also be found in string literals. You cannot parse code with a single regex safely, only with some assumptions. – Wiktor Stribiżew Mar 20 '18 at 21:45
  • 1
    You are right, i probably made some mistakes while trying to escape some characters.. thank you! – Riccardo Mar 21 '18 at 07:46
  • Need more reputation for upvote, but I did it anyway :) thanks a lot – Riccardo Mar 21 '18 at 07:59

1 Answers1

1

You can't safely parse code with one regex, so, fixing the catastrophic backtracking won't really solve the issue.

Using some JavaScript code parser will be the right solution.

If you are fine with matching comment like substrings inside string literals, comments, etc., you may use

var rx = new RegExp('/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/|//.*', 'g')

See the online JS regex demo. Note that a regex constructor is prefereble due to many / chars in the pattern and thus all regex escaping is done using double \ chars.

Details

  • \*[^*]*\*+(?:[^/*][^*]*\*+)*/ - a multiline matching regex (see description here)
  • | - or
  • //.* - double slash and then any 0+ chars other than line break chars.
YakovL
  • 7,557
  • 12
  • 62
  • 102
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563