0

I'm trying to write a regex to scan a codebase to find all instances of a function call, and return a list of the arguments to the function.

The function is called "t" (I know, helpful eh) and it takes a single string which is a mix of lowercase letters (or digits) separated by ., eg

t('foo.bar.baz')
t("chunky.chicken")
t('red.green.yellow.blue.1.2.3')

One immediate problem is to distinguish it from other function calls ending in t, but I thought I could do that by preceding the regex with [^a-z], ie "not a letter".

Here's what I have so far, which I thought would work but doesn't, quite:

/[^a-z]t\(["']([a-z0-9_]+\.)+[a-z0-9_]+["']\)/

#with my thinking being as follows:
[^a-z] #not a letter
t #the character t
\( #open bracket
["'] #one instance of single or double quote
([a-z0-9_]+\.)+ #one or more instances of mix-of-letters-and-numbers-and-underscores followed by .
[a-z0-9_]+ #mix-of-letters-and-numbers-and-underscores
["'] #one instance of single or double quote
\) #close bracket

I'm matching it against this test line:

s = "<span style=\"color: #999;font-size: 0.5em;display: block;\"><%= t('cmw.letter.yumu_invite_for') %>:</span> <%= h(pupil.first_name.capitalize) %>"

and it's returning "letter." whereas i want it to return " t('cmw.letter.yumu_invite_for')"

I think the problem is this part: ([a-z0-9_]+\.)+ #one or more instances of mix-of-letters-and-numbers-and-underscores followed by .

If i change it so that instead of looking for "one or more instances of this pattern", it's specifically looking for three segments, with . inbetween, then it works:

s = "<span style=\"color: #999;font-size: 0.5em;display: block;\"><%= t('cmw.letter.yumu_invite_for') %>:</span> <%= h(pupil.first_name.capitalize) %>"
regex = /[^a-z]t\(["'][a-z0-9_]+\.[a-z0-9_]+\.+[a-z0-9_]+["']\)/
s.scan(regex)
=> [" t('cmw.letter.yumu_invite_for')"]

So, I guess that the "multiple instances of this pattern" bit doesn't work like how I think it works?

This is in ruby but I think this might be a more general regex question.

EDIT - I just tried this in javascript and it works:

s.match(/[^a-z]t\(["']([a-z0-9_]+\.)+[a-z0-9_]+["']\)/)[0]
" t('cmw.letter.yumu_invite_for')"

So actually I think maybe this is a ruby question after all.

Max Williams
  • 32,435
  • 31
  • 130
  • 197
  • Maybe this one: `\bt\([^)]+\)`? – kishkin Apr 30 '20 at 15:59
  • Also if you want to use your regex, then you might want to use non-capturing group not to get only one part of the result: `[^a-z]t\(["'](?:[a-z0-9_]+\.)+[a-z0-9_]+["']\)` – kishkin Apr 30 '20 at 16:01
  • [This answer](https://stackoverflow.com/a/31319160/3832970) will clear it out for you. – Wiktor Stribiżew Apr 30 '20 at 16:04
  • And use `regex = /\bt\(["'][a-z0-9_]+(?:\.[a-z0-9_]+)*["']\)/` – Wiktor Stribiżew Apr 30 '20 at 16:05
  • Just so you're aware - that t() function is very likely a translate function, it means that this application has been localized for more than one language. So there is a yaml file somewhere (in config/ folder likely) named perhaps en.yml and then es.yml etc, and the key that you're seeing 'chunky.chicken' is the key to the corresponding text in the language file. This is a common standard for creating a single site that is multi-lingual. I think that you don't need this regex so much as you need to understand translation in Rails. – tgmerritt Apr 30 '20 at 20:10
  • @tgmerritt it is indeed a translation function, and the translations are indeed in yaml. I'm writing a script to look for all the calls to it in our codebase so that I can identify translation calls to keys that don't exist and, vice versa, which translation keys are not used anywhere. – Max Williams May 01 '20 at 08:20

0 Answers0