Matching variable words in a string

Question

This will sound extremely nerdy, but I play this online game that writes its in-game events to a log file. There's a program I'm using that is capable of reading this log file, and it's also capable of interpreting regex. My goal is to write a regex command that analyzes a certain string from this log file and then spits out certain parts of the string onto my screen.

The string that gets written to the log file has the following syntax (variables in bold):

NAME hits/bashes/crushes/claws/whatever NEWNAME for NUMBER points of damage.

If it matters, NUMBER will never contain commas or spaces, and the action verb (hits, bashes, whatever) will only ever be a single word without any special characters, spaces, numbers, etc.

What I'd like this program to do is interpret the regex code that I enter and spit out a result that says: NAME attacks NEWNAME

The catch is, NAME and NEWNAME can have the following range of possibilities (names and examples picked at random):

Kevin
Kevin's pet
Kevin from Oregon
Kevin from Oregon's pet
Kevin from Oregon`s pet (note the grave accent there instead of the apostrophe)

It's pretty simple if it's just something like Kevin hits Josh for 10728 points of damage. In this case, my regex is the following code block (please note that the program interprets the {N} wildcard on its own as any number without the need for regex):

(?<char1>\w+) \w+ (?<char2>\w+) for {N} points of damage.

...and my output reads...

${char1} attacks ${char2}

Whenever the game outputs that string of Kevin hits Josh for 10728 points of damage. to the log file, the program I'm using picks up on it and correctly outputs Kevin attacks Josh to my screen.

However, using that regex line results in a failure when spaces, apostrophes, grave accents, and/or any combination of the three are present in either NAME or NEWNAME.

I tried to alter the regex line to read...

(?<char1>[a-zA-Z0-9_ ]+) \w+ (?<char2>[a-zA-Z0-9_ ]+) for {N} points of damage.

...but when I encounter the string Kevin bashes Josh of Texas for 2132344 points of damage., for example, the output to my screen winds up being:

Kevin bashes Josh attacks Texas.

I'm trying different things but ultimately not coming up with something that's spitting out the proper format of NAME attacks NEWNAME when those two variables contain spaces, apostrophes, grave accents, and/or any combination of the three.

Any help or tips on what I'm doing wrong or how I can further alter that regex line would be extremely appreciated!

well, I regret hitting Josh, but you could do it off of spaces. It's been awhile since I've done a REGEX, but should be able to do a compare with "Kevin", then take a specific number of words after that. This question might help you: https://stackoverflow.com/questions/5752829/regular-expression-for-exact-match-of-a-word Keep in mind, REGEX, for me, was a ton of trial and error. — Kevin Fischer, Mar 08 '18 at 17:20
Thank you for the quick response! This would mean I'd need multiple regex triggers, correct? Because **NAME** and **NEWNAME** aren't _always_ going to have spaces, apostrophes, etc., etc., it's just that they _could_. So I'd need to have one expression written in the event there's three spaces, two spaces, etc., etc., right? — , Mar 08 '18 at 17:44
if that's a possibility, i guess so. if you can read that log and see how many times they use multiple spaces, you could base it off that. I'm just guessing here, but being the fact that it's a computer log, i highly doubt they'll have more than one space because it'll take up more file space over time. — Kevin Fischer, Mar 08 '18 at 18:29
It's an open-world MMO, so it's 100% random based on what I would encounter. There's over 1,200 zones with all sorts of populations that are all named unique. It's rather common to have at least several spaces in a name, not as common to see apostrophes, and even less common to see grave accents, but they're all still possible. I'm continuing to toy with it! — , Mar 08 '18 at 18:40
might also want to check the game's forum and see if there's anyone doing the same thing — Kevin Fischer, Mar 08 '18 at 18:41
Do you know the exhaustive list of action verbs used? If so a suitable regex is definitely possible. — PJProudhon, Mar 09 '18 at 10:25
I have a decent idea of the action verbs that would be used, but it would be a pretty long list. Unfortunately I do not know the exact language, it all goes through a third party program and all I know is that it parses regex. — , Mar 09 '18 at 13:43
Without that list I'm afraid regex won't be of any help: as attacker and defender can contain multiple words you won't be able to identify the verb. — PJProudhon, Mar 09 '18 at 13:52
I have a list that will make up the most commonly used words 99% of the time. Are you saying that the trigger would put all of these words in something like "... (hits|crushes|bashes|etc) ..." and then continues on? Worst case scenario, I can just add to it as words are discovered over time. — , Mar 09 '18 at 14:00
This is exactly the lead. Using this you may now be able to provide your own answer :-) — PJProudhon, Mar 09 '18 at 14:54
You need to look into non-greedy quantifiers and specify the exact legal character set for NAMEs. You will also need to know the set of verbs, otherwise how would you differentiate between `Kevin bashes` hits `Josh` for 2 points." and `Kevin` bashes `hits Josh` for 3 points.? — NetMage, Mar 10 '18 at 00:47

score 0 · Answer 1 · answered Aug 02 '18 at 14:40

This is going to sound even nerdier, but I think the question isn't the regex, it's what tool you use the regex in.

Your biggest problem thus far has been the names. I suggest ignoring the names, and focusing only on the elements you know are there. The names are what's left.

I tried this myself using GNU sed:

sed -e 's/for [[:digit:]]\+ points of damage//' -e 's/hits\|bashes\|crushes/attacks/'

You see, first we can eliminate the end of the sentence, which is wholly superfluous. Then, we simply switch the verb to "attacks".

If the program uses a synonym for "attacks" that you don't have yet, you'll still have reasonable output; you can then fix your regex to include the new synonym.

You are guaranteed trouble if somebody's name includes "bashes" (or whatever) in it.

The second sed expression should be improved to be relevant only at a word boundary, but I'll leave that as an exercise for the reader. :)

Matching variable words in a string

1 Answers1