0

I have a string filled with script tags for Javascript files where I would like to replace all instances of .js with .min.js. I can't assume that the contents of the string will be in the same format as shown below. So I need to restrict the replace to within the src= part.

I'm assuming that a regex would be best for this, but how would I go about doing the find, then replace for all instances?

<script type=\"text/javascript\" src=\"../../Scripts/json.js\"></script><script type=\"text/javascript\" src=\"../../Scripts/Logger.js\"></script>  <script type=\"text/javascript\" src=\"../../Scripts/PageHelper.js\"></script>

The other consideration would be that I wouldn't want to replace .min.js where it is already defined as min.js.

Brian Rasmussen
  • 114,645
  • 34
  • 221
  • 317
StuffandBlah
  • 1,047
  • 4
  • 13
  • 22
  • 3
    why do not just use *Find and Replace* of Visual Studio ? – Tigran Jul 02 '12 at 16:02
  • The best would be to output page correctly - using ".min.js" or ".js" depending on needs... - no need for dangerous RegEx manipulations. – Alexei Levenkov Jul 02 '12 at 16:04
  • 1
    And obviously [don't parse HTML with regex](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) as it just not suitable if you can't guarantee format of HTML. – Alexei Levenkov Jul 02 '12 at 16:07
  • 1
    i dont't think this can be solved by only using regex, at least it could be hard to make a 100% safe one. why not write short function that searches for the src= tags (and counts quotes %2 to be sure it is not inside some x="...src"... and then with contains loop through all contains results of ".js", and replace if there is no ".min", at the index before the .js ? this can be programmed in 2 minutes. – cppanda Jul 02 '12 at 16:07
  • As others have said, regular expressions are not very suitable for tasks like this. Do you absolutely need to use them (cannot imagine a reason) or would you accept a non-regex solution as well? – Nikola Anusev Jul 02 '12 at 16:14
  • I probably didn't make it clear. The string could contain actual JavaScript along with references to .js files. Yes, I could write a simple method to replace the src instances, which I probably will end up doing. I was wondering if it could be done via regex. – StuffandBlah Jul 02 '12 at 17:36

1 Answers1

0

Disclaimer: I don't recommending regex for html parsing...
If the script tag quotes are not escaped (they shouldn't be), this should work

======================

raw regex find

<script(?=\s)(?=((?:[^>"']|"[^"]*"|'[^']*')*?)(?<=\s)src\s*=(?:(?>\s*(['"])\s*((?:(?!\g{-2}).)+)(?<!\.min)\.js\s*\g{-2})|(?>(?!\s*['"])\s*([^\s>]*)(?<!\.min)\.js(?=\s|>)))((?>(?:".*?"|'.*?'|[^>]?)+)))(?>\s+(?:".*?"|'.*?'|[^>]*?)+>)(?<!/>)

raw replacement

<script$1src="$3$4.min.js"$5>

modifier 's', single line (means dot-all)

expanded regex

 <script 
   (?=\s) 
   (?= 
       ( (?: [^>"']|"[^"]*"|'[^']*')*? ) (?<=\s)          # (1) - before 'src'
       src \s*=
       (?:
           (?> \s* (['"])  \s* ((?:(?!\g{-2}).)+ ) (?<!\.min)\.js \s* \g{-2} )   # (2,3)
         | (?> (?!\s*['"]) \s* ([^\s>]*)           (?<!\.min)\.js (?=\s|>)   )   # (4)  - use $3.$4
       )
       ( (?> (?:".*?"|'.*?'|[^>]?)+ ) )                   # (5) - after 'src'
   )
   (?> \s+ (?:".*?"|'.*?'|[^>]*?)+ 
 >
   ) (?<! /> )

or ...

     <script
     (?= \s )
     (?=
1         (
               (?: [^>"'] | " [^"]* " | ' [^']* ' )*?
1         )
          (?<= \s )
          src\s*=
          (?:
               (?>
                    \s*
2                   ( ['"] )
                    \s*
3                   (
                         (?:
                              (?! \g{-2} )
                              .
                         )+
3                   )
                    (?<! \.min )
                    \.js\s*\g{-2}
               )
            |  
               (?>
                    (?! \s* ['"] )
                    \s*
4                   ( [^\s>]* )
                    (?<! \.min )
                    \.js
                    (?= \s | > )
               )
          )
5         (
               (?>
                    (?: ".*?" | '.*?' | [^>]? )+
               )
5         )
     )
     (?>
          \s+
          (?: ".*?" | '.*?' | [^>]*? )+
          >
     )
     (?<! /> )