5

I'd like to add a new keyword to Python and @EliBendersky's wonderful answer explains how to do this by changing the code and re-distributing the Python compiler.

Is it possible to introduce a new keyword without changing the compiler code? Perhaps introduce it through a library?

Edit:

For example, I'd like to add a shorthand for regex matching by adding a keyword like matches that can be used like:

"You can't take the sky from me" matches '.+sky.+'

I can add new, custom behavior using AST transformations, but the above case will fail on a syntax error.

noamt
  • 7,397
  • 2
  • 37
  • 59
  • 1
    Out of curiosity, why? – CIsForCookies Jan 30 '18 at 14:45
  • 1
    new keyword, well, seems difficult. But you could use `from xxx import yyy`: that defines `yyy` as a "keyword". – Jean-François Fabre Jan 30 '18 at 14:47
  • 5
    No. keywords are built in to the lexer, which would need to be re-compiled. keywords are part of the official grammar definition. – Aaron Jan 30 '18 at 14:47
  • @CIsForCookies I'm developing a library (https://github.com/browncoat-ninjas/nimoy) and I'd like to have a new keyword added – noamt Jan 30 '18 at 14:48
  • 3
    XY problem. You want your library used? stick to official python :) – Jean-François Fabre Jan 30 '18 at 14:48
  • @Jean-FrançoisFabre which is exactly why I want to avoid customizing the compiler :) – noamt Jan 30 '18 at 14:49
  • 1
    there's probably another way than defining a new keyword. Can you show us an example of what you want to achieve. – Jean-François Fabre Jan 30 '18 at 14:50
  • 3
    The bar for adding a keyword is very high, because by definition the compiler needs to recognize it, which precludes it from being used as an identifier *anywhere* else. – chepner Jan 30 '18 at 14:50
  • @Jean-FrançoisFabre I've edited the question with an example – noamt Jan 30 '18 at 14:57
  • 1
    In short, no. You can't change the Python language without changing the Python language. And if you could, Python would be a mess. – khelwood Jan 30 '18 at 14:58
  • 1
    looks like you'd better build a python preprocessor for this – Jean-François Fabre Jan 30 '18 at 14:58
  • 1
    There's very little reason to use that over `re.match('.+sky.+', "You can't take the sky from me")`. What is the benefit, aside from looking different? – chepner Jan 30 '18 at 14:59
  • @chepner syntactic sugar + I want to select which matcher is used. For example - I'd like the keyword to always use the pyhamcrest matcher – noamt Jan 30 '18 at 15:01
  • A library can handle the second part, so you are left with reserving an entire word, breaking backwards compatibility with any code that might already use `matches` as an identifier, to handle one very small aspect of your program. – chepner Jan 30 '18 at 15:05

2 Answers2

5

One cannot introduce a new keyword without changing the language

The parser is the tool/program that reads through the code, and decides what makes sense and what doesn't. Although it's a rather coarse definition, the consequence is that the language is defined by its parser.

The parser relies on the language's (formal) grammar, specified in the ast module documentation.

While defining a mere function only introduces a new feature without modifying the language, adding a keyword is tantamount to introducing a new syntax, which in turn changes the language's grammar.

Therefore, adding a new keyword, in the sense of adding a new syntax to a language, cannot be made without changing the grammar's language, which requires editing the compilation and execution chain.

However...

There might be some smart ways to introduce a new feature, that looks like a new syntax but in fact only uses the existing syntax. For instance, the goto module relies on a not-so-well-known property of the language, that the spaces around a dot in a qualified identifier are ignored.

You can try this by yourself:

>>> l = [1, 2, 3]
>>> l    .append(4)
>>> l
[1, 2, 3, 4]
>>> l.    append(5)
>>> l
[1, 2, 3, 4, 5]

This allows using the following, that looks like a new syntax, but really is not:

label .myLabel
goto .myLabel

Now, the goto module uses the way the interpreter internally works to perform break from one goto to a given label... But that's another problem.


I'd like to add that Python is quite an open-minded language. It provides a nice amount of seldom used operators, for instance, @. This operator, introduced from Python 3.5, was primarily meant for matrix multiplication, and falls back to a call to __matmul__. I have to say, I've never seen it in code. So, why not use it for your purpose?

Let's do it step-by-step. I propose to define a r class, that will behave as a regex.

import re

class r:
    def __init__(self, pattern):
        self.regex = re.compile(pattern)

Now, I want to be able to use the @ operator with this class, together with a string, with the semantic of a match between the string and the pattern. I'll define the __matmul__ method, just as follows:

class r:
    def __matmul__(self, string):
        return bool(self.regex.match(string))

Now, I can do the following:

>>> r("hello") @ "hello"
True
>>> r("hello"] @ "world"
False

Pretty nice, but not that yet. I'll define the __rmatmul__ method as well, so it merely falls back to a call to __matmul__. In the end, the r class looks like this:

class r:
    def __init__(self, pattern):
        self.regex = re.compile(pattern)

    def __matmul__(self, string):
        return bool(self.regex.match(string))

    def __rmatmul__(self, string):
        return self @ string

Now, the reverse operation works as well:

>>> "hello" @ r("hello")
True
>>> "123456" @ r("\d+")
True
>>> "abc def" @ r("\S+$")
False

This is very near from what you were attempting, except, I didn't have to introduce a new keyword! Of course, now the r identifier must be protected, just like str or list...

Community
  • 1
  • 1
Right leg
  • 16,080
  • 7
  • 48
  • 81
  • Great idea. And I don't even need to add the new class. I can just transform it using AST – noamt Jan 30 '18 at 15:34
1

For your particular "problem" (shorten the way to match a regex), a solution would be to create a subclass of str and use an unused binary operator (ex: minus, maybe a better choice could be done, unfortunately we cannot use ~ as it's unary)

example:

import re

class MyStr(str):
    def __sub__(self,other):
        return re.match(other,self)

a = MyStr("You can't take the sky from me")
print(a - '.+sky.+')
print(a - '.+xxx.+')

result:

<_sre.SRE_Match object; span=(0, 30), match="You can't take the sky from me">
None

So "subbing" the regex from your string object returns the match object.

The disavantage is that now you have to write string literals wrapped in the new object (not possible to define this new operator into str itself)

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219