2

I have the following string of anchors (where I want to change the contents of the href) and a lua table of replacements, which tells which word should be replaced for:

s1 = '<a href="word1"></a><a href="word2"></a><a href="word3"></a><a href="word1"></a><a href="word5"></a><a href="word2"></a><a href="word3"><a href="word7"></a>'

replacementTable = {}
replacementTable["word1"] = "potato1"
replacementTable["word2"] = "potato2"
replacementTable["word3"] = "potato3"
replacementTable["word4"] = "potato4"
replacementTable["word5"] = "potato5"

The expected result should be:

<a href="potato1"></a><a href="potato2"></a><a href="potato3"></a><a href="potato1"></a><a href="potato5"></a><a href="potato2"></a><a href="potato3"><a href="word7"></a>

I know I could do this iterating for each element in the replacementTable and process the string each time, but my gut feeling tells me that if by any chance the string is very big and/or the replacement table becomes big, this apporach is going to perform poorly.

So I though it could be best if I could do the following: apply the regular expression for finding all the matches, get an iterator for each match and replace each match for its value in the replacementTable.

Something like this would be great (writing it in Javascript because I don't know yet how to write lambdas in Lua):

var newString = patternReplacement(s1, '<a[^>]* href="([^"]*)"', function(match) { return replacementTable[match] })

Where the first parameter is the string, the second one the regular expression and the third one a function that is executed for each match to get the replacement. This way I think s1 gets parsed once, being more efficient.

Is there any way to do this in Lua?

hjpotter92
  • 78,589
  • 36
  • 144
  • 183
David Jiménez Martínez
  • 3,053
  • 5
  • 23
  • 43
  • 2
    Your deleted answer is correct. Although, you can simply use: `s1:gsub(']* href="([^"]*)"', replacementTable)` – hjpotter92 Nov 22 '16 at 16:17
  • 1
    As always, [be careful](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) when using Regular Expressions (Lua Patterns) to handle HTML. For limited, non-arbitrary use you'll likely be fine, but as soon as your program starts getting more and more complicated you'll be sure to run into problems. Use a proper HTML parser / DOM constructor when appropriate. – Oka Nov 22 '16 at 19:38
  • @hjpotter92 your answer doesn't produce the expected output: `potato1>potato2>potato3>potato1>potato5>potato2>potato3>` – David Jiménez Martínez Nov 22 '16 at 21:28

2 Answers2

2

In your example, this simple code works:

print((s1:gsub("%w+",replacementTable)))

The point is that gsub already accepts a table of replacements.

lhf
  • 70,581
  • 9
  • 108
  • 149
0

In the end, the solution that worked for me was the following one:

   local updatedBody = string.gsub(body, '(<a[^>]* href=")(/[^"%?]*)([^"]*")', function(leftSide, url, rightSide)
            local replacedUrl = url
            if (urlsToReplace[url]) then replacedUrl = urlsToReplace[url] end
            return leftSide .. replacedUrl .. rightSide
        end)

It kept out any querystring parameter giving me just the URI. I know it's a bad idea to parse HTML bodies with regular expressions but for my case, where I required a lot of performance, this was performing a lot faster and just did the job.

David Jiménez Martínez
  • 3,053
  • 5
  • 23
  • 43