3

I did a web search for "sql.Identifier without quoting" and this post was relevant only. It advices to use .format(sql.Identifier.

However this method adds double quotes to identifiers and cannot as far as I can tell be used for identifiers made w/out quotes in PostgreSQL. As I've read in here expert advice not to quote identifiers in Postgres.

I do not see an option in sql.Identifier to skip quoting in the document and alternative methods in sql module of psycopg2. How can I use PostgreSQL from Python in injection-safe way for unquoted identifiers?

ADDED: my confusion was due to me using "public.abc" for sql.Identifier when as noted in the answer by @klin I should have used two identifiers. After that sorted out I see quoting is used only for case sensitive (and/or where "other" symbols like dot are used).

Alex Martian
  • 3,423
  • 7
  • 36
  • 71
  • I think quoting identifiers is fine. Choose a consistent casing for all identifiers and stick to it. Double quoting keeps you honest. This is reinforced by the behaviour of the libraries. My 2p. – w08r Jan 28 '20 at 06:31
  • @wobr, thanx. It is just unfamiliar to make "public.abc" and then neither "abc" not "public"."abc", abc, public.abc works, only "public.abc". – Alex Martian Jan 28 '20 at 06:51

4 Answers4

3

If your database API routinely quotes identifiers, that is a good thing. Don't try hard to work around that.

The sentiment against quoted identifiers is not directed against the quoting as such, rather against the choice of idenifiers that require quoting.That are identifiers that do not follow the syntax rules for identifiers or contain upper case letters. Such identifiers cause all kinds of annoyances, for example in shell scripts, where quoting always is a nuisance.

Quoting an identifier that does not require quoting is innocuous, and it is actually a good safety measure.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
2

You should distinguish between two different situations. When you want to use uppercase letters in case-sensitive identifiers you have to use double-quotes. If you use only lowercase letters for identifiers, you can but not have to use double-quotes. Experts usually recommend avoiding the first case. In the second case, the quotes automatically added by psycopg2 are not a problem.

Note, that public.abc is not an identifier, it is an expression containing two identifiers. Hence you should use it in the way like this:

sql.SQL("select * from {}.{}").format(sql.Identifier(schema_name), sql.Identifier(table_name))

or this:

sql.SQL("select * from {}").format(sql.Identifier(schema_name, table_name))

as (per the documentation):

Multiple strings can be passed to the object to represent a qualified name, i.e. a dot-separated sequence of identifiers.

klin
  • 112,967
  • 15
  • 204
  • 232
1

However this method adds double quotes to identifiers and cannot as far as I can tell be used for identifiers made w/out quotes in PostgreSQL.

It absolutely can. The sticking point is that quoted identifiers are case sensitive (amongst other concerns) while unquoted are case-folded.

That case-folding is why you can get odd results: when you write create table Foo () or select Foo from Bar, Postgres first applies its case-folding rules to unquoted identifiers, then does everything else. So if you quote identifiers you need to match the result of that case-folding:

  • postgres folds to lowercase (this is explicitly documented), as long as you lowercase quoted identifiers, they'll match the unquoted one however it is the unquoted ones are written
  • however this is not portable because the SQL spec demands uppercase folding
  • and because databases do case-folding rather than case-insensitive matching, terms created using quoted identifiers may not be accessible at all from unquoted identifiers e.g. if you create a table "Foo", select from Foo will not even see it:
# create table "Foo" ();
CREATE TABLE
# select from "Foo";
--
(0 rows)

# select from Foo;
ERROR:  relation "foo" does not exist
LINE 1: select from Foo;
                    ^
Masklinn
  • 34,759
  • 3
  • 38
  • 57
0

I've implemented a version of psycopg2.sql.Identifier that will quote identifiers only when needed:

"""psycopg2 psycopg2.sql.Identifier that will quote identifiers only when needed"""

import re
import psycopg2
import psycopg2.sql


class SQLIdentifier(psycopg2.sql.Identifier):
    """psycopg2 psycopg2.sql.Identifier that will quote identifiers only when needed"""

    def as_string(self, context: typing.Any) -> str:
        """Return str representation of this identifier, quoted when needed"""
        return ".".join(self.quote_ident_if_required(s, context) for s in self.strings)

    def __str__(self) -> str:
        return ".".join(self.quote_ident_if_required(s, None) for s in self.strings)

    SAFE_IDENTIFIER_PATTERN = re.compile("[a-z_][a-z0-9_]*")

    @staticmethod
    def quote_ident_if_required(identifier: str, context: typing.Any) -> str:
        """Return str quoted if required for use in an identifier"""
        if SQLIdentifier.SAFE_IDENTIFIER_PATTERN.fullmatch(identifier):
            return identifier
        if context is None:
            return '"{}"'.format(identifier.replace('"', '""'))
        return str(psycopg2.extensions.quote_ident(identifier, context))
Tometzky
  • 22,573
  • 5
  • 59
  • 73