0

So I'm trying to erase everything except the matched case in this 1900 line document with Notepad++ RegExp Find/Replace, so that I only have the file names, which shorten it to under about 1000 lines at minimum. I know the code that selects the text ((?<=/images/item/)(.*)(?=" a) but the problem is I don't know how to make it erase anything that doesn't match that case. Here's a portion of the document.

using notepad++, it would find and select abyssal-scepter.gif, aegis-of-the-legion.gif, etc

<img src="/images/item/abyssal-scepter.gif" alt="LoL Item: Abyssal Scepter"><br>                                                                                                                <div id="id_77" class="tier-wrapper drag-items health magic-resist health-regen champ-box float-left ajax-tooltip {t:'Item',i:'77'} classic-and-dominion filter-is-dominion filter-is-classic filter-tier-advanced filter-bonus-aura       filter-category-health filter-category-magic-resist filter-category-health-regen ui-draggable ui-draggable-handle">
<img src="/images/item/aegis-of-the-legion.gif" alt="LoL Item: Aegis of the Legion"><br>                                                                                                                    <div id="id_235" class="tier-wrapper drag-items ability-power movement champ-box float-left ajax-tooltip {t:'Item',i:'235'}    filter-tier-advanced   filter-bonus-unique-passive     filter-category-ability-power filter-category-movement ui-draggable ui-draggable-handle">
<img src="/images/item/aether-wisp.gif" alt="LoL Item: Aether Wisp"><br>
<div class="info">
<div class="champ-name">Aether Wisp</div>
<div class="champ-sub">

<img src="/images/gold.png" alt="Item Cost" style="width:16px; vertical-align:middle;"> 850 / 415
</div>
</div>                  
</div>
<div id="id_21" class="tier-wrapper drag-items ability-power champ-box float-left ajax-tooltip {t:'Item',i:'21'} classic-and-dominion filter-is-dominion filter-is-classic filter-tier-basic        filter-category-ability-power ui-draggable ui-draggable-handle">
<img src="/images/item/amplifying-tome.gif" alt="LoL Item: Amplifying Tome"><br>
<div class="info">
<div class="champ-name">Amplifying Tome</div>
<div class="champ-sub">

I'm not familiar with RegExp, so to summarize, I need it to look like this at the end of it.

abyssal-scepter.gif
aegis-of-thelegion.gif
aether-wisp.gif
amplifying-tome.gif

Thank you for your time

Community
  • 1
  • 1
The Gaming Hideout
  • 574
  • 2
  • 10
  • 26
  • 1
    Why JavaScript tag if you are using Notepad++? – Wiktor Stribiżew Aug 26 '16 at 13:35
  • Have a look at this question about negative selection: http://stackoverflow.com/questions/164414/how-to-inverse-match-with-regex – Arashsoft Aug 26 '16 at 13:39
  • 1
    @Arashsoft: That deletes a fixed length text. A more comprehensive approach is via using an alternation of the captured pattern meant to be kept and an (unrolled) tempered greedy token with the pattern so as to discard that part. Depending on the type of input, a simple alternation with `.*` can work. – Wiktor Stribiżew Aug 26 '16 at 13:42
  • I would CTRL+A, CTRL+C, open the dev-tools in my browser, write something like `var str = "{CTRL+V}";` *but with the quotes, we use here to highlight code*, and then perform my `str.match(...).join("\n")`. – Thomas Aug 26 '16 at 13:43

3 Answers3

2

A Notepad++ solution:

Find what : .*?/images/item/(.*?)"|.*
Replace with : $1\n
Search mode : Regular expression (with ". matches newline" checked)

The result will have an extra linefeed at the end.
But that shouldn't pose a problem I suppose.

LukStorms
  • 28,916
  • 5
  • 31
  • 45
  • thank you, it worked great. but because i accidentally added javascript since I was thinking about c9's ide and how it uses javascript find and replace, i am once again banned from asking my many questions.. :( – The Gaming Hideout Aug 26 '16 at 16:47
  • Good to hear that it helped. Yeah well, the javascript flavor in javascript is more limited than the PCRE flavor used in PHP and Notepad++. A a rule of tumb, when a regex works in javascript it'll probably work in other regex flavors. – LukStorms Aug 26 '16 at 16:56
  • I added JavaScript because lookbehind isn't supported in JS, and that's what I was thinking would be needed – The Gaming Hideout Aug 26 '16 at 16:57
1

Maybe this can help. or not since you dropped the Javascript tag out of your original post

<script type="text/javascript">
    var thestring = "<img src=\"/images/item/aegis-of-the-legion.gif\" alt=\"LoL Item: Aegis of the Legion\"><br>";
    var thestring2 = "<img src=\"/images/otherstuff/aegis-of-the-legion.gif\" alt=\"LoL Item: Aegis of the Legion\"><br>";

    function ParseIt(incomingstring) {
        var pattern = /"\/images\/item\/(.*)" /;
        if (pattern.test(incomingstring)) {
            return pattern.exec(incomingstring)[1];
        }
        else {
            return "";
        }
        //return pattern.test(incomingstring) ? pattern.exec(incomingstring)[1] : "";
    }
</script>

Calling ParseIt(thestring) returns "aegis-of-the-legion.gif"

Calling ParseIt(thestring2) return ""

blaze_125
  • 2,262
  • 1
  • 9
  • 19
0

Since you are doing this in NP++, this works for me. In cases like this where speed and results are more important than specific technique, I'll usually run several regexes. First, I'll get each tag on its own line by doing a search for > and replacing it with >\n. This gets each tag on its own line for simpler processing. Then a replace of ^>*<.*?".*?/?([\w\d\-_]+\.\w{2,4})?".*>.*$ with $1 will will extract all the filenames from the tags, removing the unneeded text. Then, finally, to clear all the tags that didn't have a filename in them, just replace <.*> with an empty string. Finally, use Edit>Line Operations>Remove empty lines, and you'll have the result you're looking for. It's not a 100% regex solution, but this is a one time action that you just need a simple result from.

Jon Upchurch
  • 450
  • 4
  • 12