4

I'm implementing a simple app that changes column names (and leaves table names alone) in SQL statements. The statement is passed as a String, and modified one is also returned as a String, there's no database connection involved.

To achieve this, I'm using Apache Calcite's SQL parser. I parse the SQL string to the SqlNode, accept a SqlVisitor that creates renamed SqlNode, and then write everything back to String (using SqlNode.toSqlString()).

The problem is that I don't know how to tell the difference between columns and tables in parsed SqlNode object while accepting a SqlVisitor. Both are represented as SqlIdentifier, having the same SqlKind. Hence, when SqlVisitor is visiting the SqlIdentifier, it will rename it whether it's a column or a table.

private String changeNames(String str) throws SqlParseException {
    SqlShuttle visitor = new SqlShuttle() {
        private String rename(String str) {
            return str + "-test";
        }

        @Override
        public SqlNode visit(SqlIdentifier identifier) {
            SqlIdentifier output = new SqlIdentifier(rename(identifier.getSimple()), identifier.getCollation(), identifier.getParserPosition());
            return output;
        }
    };

    SqlParser.ConfigBuilder configBuilder =  SqlParser.configBuilder();
    configBuilder.setLex(Lex.MYSQL);
    SqlParser.Config config = configBuilder.build();

    SqlParser parser = SqlParser.create(str, config);
    SqlNode parsedStatement = parser.parseQuery(str);
    SqlNode outputNode = parsedStatement.accept(visitor);

    return outputNode.toSqlString(SqlDialect.DUMMY).getSql();
}

for example

SELECT name, address, age FROM mytablename WHERE age = 23 AND name = 'John'

will be modified into

SELECT `name-test`, `address-test`, `age-test` FROM `mytablename-test` WHERE `age-test` = 23 AND `name-test` = 'John'

How could I tell if given SqlIdentifier is a column or a table ?

Piotr Śmietana
  • 393
  • 2
  • 19

3 Answers3

2

To resolve identifiers to tables and columns, and figure out their types, you will need to use Calcite's validator (SqlValidator). The validator understands SQL name resolution rules (e.g. whether an aliases in a FROM clause can be seen in a sub-query) whereas we intentionally did not make the parser, and the SqlNode data structure it produces, aware of such things.

The two key concepts in the validator are scopes (SqlValidatorScope) and namespaces (SqlValidatorNamespace).

A scope is where you are standing and trying to resolve an identifier. For example, you might be in the SELECT clause of a query. Or in the WHERE clause of a particular sub-query. You will be able to see different collections of tables and columns in different scopes. Even a GROUP BY clause and ORDER BY clause have different scopes.

A namespace is something that looks like a table, and has a list of columns. It might be a table or, say, a sub-query in the FROM clause. If you are in a scope, you can look up a table alias, get a namespace, then look at what columns it has.

For your purposes, it would be useful if there was a variant of SqlShuttle that knew exactly which scope you are in, and where you could ask identifiers to be expanded into table and column references. Unfortunately no one has build such a thing yet.

Julian Hyde
  • 1,239
  • 7
  • 10
  • what about SqlScopedShuttle ? – Piotr Śmietana Jun 10 '16 at 14:54
  • SqlScopedShuttle sounds like it would be useful, but it doesn't do as much as you want. It just keeps a stack of AST nodes (SqlNode) as you recurse into the tree. It doesn't know anything about SQL scoping rules. – Julian Hyde Jun 12 '16 at 00:21
  • @PiotrŚmietana if you ended up with some working code, could you share it, please? – juhoautio Nov 29 '18 at 18:24
  • @Arvidaa I think I've used some choppy workaround. I don't remember what exactly it was and don't have access to this code anymore, so I can't provide any details. – Piotr Śmietana Feb 28 '19 at 10:23
0

I happen to have used calcite sqlParser a bit. some of the snippet posted below.

  public void convertSelect(SqlSelect root) {
    convertFrom(root.getFrom());
    convertWhere(root.getWhere());
  }

  public void convertFrom(SqlNode from) {
    if (from instanceof SqlJoin) {
      convertFromOfJoinExpression((SqlJoin)from);
    }
  }

  public String extractTableFromJoinNode(SqlNode jnn) {
    if (jnn instanceof SqlBasicCall) {
      SqlBasicCall asExp = (SqlBasicCall)jnn;
      if (asExp.getKind().equals(SqlKind.AS)) {
        extractTableFromJoinNodeAsExpression(asExp);
      }
    }
    return "SomeTableAlias";
  }

generally, you will get table in the from statement. and you will get columns in select statement.

and last but not least, calcite specialises in optimise the query by applying a whole lot of optimising rules. Depend on what you need (transforming column/table names), calcite might not be the best fit.

zinking
  • 5,561
  • 5
  • 49
  • 81
0

I'm now using a boolean isInFrom in the traveling of my visitor to tell me if this node is in clause from [xxxx].

Here is a simplified sample of my Visitor implementation.

class Visitor extends SqlBasicVisitor<Void> {
    boolean isInFrom = false;
    
    @Override
    Void visit(SqlCall call) {
        switch (call.getKind()) {
            case SELECT:
                return visit((SqlSelect) call);
            case JOIN:
                return visit((SqlJoin) call);
            default:
                return super.visit(call);
        }
    }

    Void visit(SqlSelect select) {
        boolean isInFromBackup = isInFrom;
        for (SqlNode child : select.getOperandList()) {
            if (child == null) continue;
            isInFrom = select.getFrom() == child;
            child.accept(this);
        }
        isInFrom = isInFromBackup;
        
    }
    
    Void visit(SqlJoin join) {
        boolean isInFromBackup = isInFrom;
        for (SqlNode child : join.getOperandList()) {
            if (child == null) continue;
            if (child == join.getCondition()) {
                isInFrom = false;
            }
            child.accept(this);
            if (child == join.getCondition()) {
                isInFrom = isInFromBackup;
            }
        }
    }

    @Override
    public Void visit(SqlIdentifier id) {
        if (isInFrom) {
            // it's table name
        } else {
            // it's field name
        }
    }
}

P.S. isInFrom is not accurately named, but I haven't got a better one.

Craynic Cai
  • 368
  • 3
  • 11