1

With Lua I am parsing the last occurrence of an XML tag from a large XML string using the pattern:

local firstPattern = "<tag>(.-)</tag>"

I then used the following code to find every occurrence:

local lastMatch
for match in string.gmatch(xmlString, firstPattern) do
  lastMatch = match
end

It did not seem very fast so I then tried adding a greedy character to the beginning of my pattern:

local secondPattern = ".*<tag>(.-)</tag>"
lastMatch = string.match(xmlString, secondPattern)

Printing os.clock() before and after the parsing I found the second pattern to be slightly faster but I have to think there is a better pattern to match the last occurrence of the xml tag.

I have also tried a third pattern but it only returns the first instance of the xml tag.

local thirdPattern = "<tag>(.-)</tag>.-$"
local firstMatch = string.match(xmlString, thirdPattern)
hjpotter92
  • 78,589
  • 36
  • 144
  • 183
blakem
  • 11
  • 2
  • Try `((?>[^<]++|<(?!/?tag>))*)(?=(?>[^<]++|<(?!/?tag))*$)` – Casimir et Hippolyte May 01 '14 at 02:56
  • @CasimiretHippolyte That is not even close to valid for lua patterns. – Etan Reisner May 01 '14 at 03:04
  • @EtanReisner: Sorry but I didn't find a good resource about regex syntax for lua. – Casimir et Hippolyte May 01 '14 at 03:37
  • @EtanReisner: has Lua the same regex capabilities as Javascript? – Casimir et Hippolyte May 01 '14 at 03:38
  • @CasimiretHippolyte No. Lua doesn't have regex it has patterns. The [manual](http://www.lua.org/manual/5.1/manual.html#5.4.1) covers them. There are libraries for actual regex support though. – Etan Reisner May 01 '14 at 03:40
  • 6
    In general parsing xml/html/etc. with regular expressions is a bad idea. An actual parser is virtually always a better solution. – Etan Reisner May 01 '14 at 03:41
  • Have you tried using LPEG or XML parser? – hjpotter92 May 01 '14 at 05:24
  • 1
    Probably, `xmlString:reverse():find('>gat<',1,true)` would be faster. – Egor Skriptunoff May 01 '14 at 09:25
  • @EtanReisner - Why parsing xml with regexp is a bad idea? – Egor Skriptunoff May 01 '14 at 09:27
  • As a rule never parse xml with regex. – Leri May 01 '14 at 13:54
  • 2
    @EgorSkriptunoff You cannot write a single regular expression that correctly parses XML, since XML is a context-free grammar and RegExs can only parse regular grammars. RegExs are simply not 'powerful' enough. See http://stackoverflow.com/a/1758162/646619 – Colonel Thirty Two May 01 '14 at 14:48
  • 2
    @ColonelThirtyTwo - That's not correct. Regexp with backreferences (such as `^(.*)%1$` in Lua) generates non-regular grammar ;-) – Egor Skriptunoff May 01 '14 at 15:37
  • The short answer is [don't do it](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). – Mud May 01 '14 at 17:29
  • @EgorSkriptunoff Then they aren't regular expressions anymore, technically. It's like saying C can have classes because C++ can. Besides, Lua patterns aren't regexes anyway. – Colonel Thirty Two May 01 '14 at 17:52
  • @ColonelThirtyTwo - So, what is your argument against using Lua patterns for parsing XML? – Egor Skriptunoff May 01 '14 at 19:28
  • @ColonelThirtyTwo - I completely agree that a true XML DOM parser would be the most robust, but I am only interested in this 1 value from the whole file. In my situation I think a pattern is not too unreasonable since I don't care about the tree structure of the file. In other situations parsing the entire file into an object model would make the most sense. – blakem May 02 '14 at 02:30
  • To follow up, I have since actually used an XML parser and it does make it very easy to find the last element since I can get it by `myXMLtable[table.getn(myXMLtable)]`. I went with the XML parser since I finally needed to look at more than just that one value. Actually building the nested Table in memory from the XML is not too costly. So thank you @ColonelThirtyTwo for steering me in that direction. – blakem May 07 '14 at 00:38
  • @blakem `table.getn` is depreciated; use `myXMLtable[#myXMLtable]` instead. – Colonel Thirty Two May 07 '14 at 15:57

0 Answers0