5

In one scenario we are dynamically creating sql to create temp tables on-fly. There is no issue with table_name as it is decided by us however the column-names are provided by sources not in our control.

Usually we would check the column names using below query:

select  ..
where NOT REGEXP_LIKE (Column_Name_String,'^([a-zA-Z])[a-zA-Z0-9_]*$') 
OR Column_Name_String is NULL
OR Length(Column_Name_String) > 30

However is there any build in function which can do a more extensive check. Also any input on the above query is welcome as well.

Thanks in advance.


Final query based on below answers:

select  ..
where NOT REGEXP_LIKE (Column_Name_String,'^([a-zA-Z])[a-zA-Z0-9_]{0,29}$') 
OR Column_Name_String is NULL
OR Upper(Column_Name_String) in (select Upper(RESERVED_WORDS.Keyword) from V$RESERVED_WORDS RESERVED_WORDS)

Particularly not happy with character's like $ in column name either hence won't be using..

dbms_assert.simple_sql_name('VALID_NAME')

Instead with regexp I can decide my own set of character's to allow.

pOrinG
  • 896
  • 3
  • 13
  • 27
  • @TimBiegeleisen Just trying to select all invalid columns only. The check below handle's it. If `no data found` then it moves forward else `error`. – pOrinG Jan 04 '18 at 07:17
  • Why do you create tables dynamically? That's not usual in Oracle world (but it is in MS SQL Server, as far as I've heard). So, if your primary background is SQL Server, perhaps you'd want to explain what exactly you are doing and why; someone might point you to a right (i.e. *better*) direction. – Littlefoot Jan 04 '18 at 07:31
  • @Littlefoot Well, there is no way I can change the process but let me share a little background so that I might get some better ideas for the future (to use best-practices). Currently we are getting data from external sources (files) and the job of the program/script is to push that data to oracle tables. Although there are certain rules in which the third party should provide the data they violate them from time to time. Eg a column is number data type but if data provided is string then my data-push(sqlldr) will fail.... – pOrinG Jan 04 '18 at 07:51
  • @Littlefoot In-order to validate the data we decided to create on-fly tables using the columns & data provided by the third party and validate it. In this way we are at-once able to validate the whole set of data & provide errors. If we just rely on oracle then the errors might be thrown one-by-one as the exceptions are thrown and will make the error-fixing task a lot tedious. – pOrinG Jan 04 '18 at 07:53
  • @APC please check above. Appreciate some tips & advise. – pOrinG Jan 04 '18 at 07:55
  • Aha; so, you get a file from 3rd party. Your code parses it and creates that TEMP table based on 1st row's values (which are, actually, column names) and then - if everything is OK - you load data. Well, the simplest option is to *do nothing* - use what you have in that file. If those "column names" are invalid, Oracle won't create table anyway. Because - what benefit do you get checking column names that way? Will you change invalid column name to something different? If so, you could use "generic" column names, regardless of what you get in the file. – Littlefoot Jan 04 '18 at 08:00
  • @Littlefoot If the column name is invalid the `create table` will fail. However I won't be able to pinpoint exactly why. Is it because of which column name? Are there more invalid columns?. I want to pinpoint & identify & handle these errors before its occurrence, hence the query. The check should be as extensive as it can be so that we can fix it without going back & forth many times. – pOrinG Jan 04 '18 at 08:04
  • OK, no problem. We're just discussing. Now that you explained what you are doing and why, things are much more clear. – Littlefoot Jan 04 '18 at 08:10
  • @Littlefoot Personally I believe the whole process can be done in a much better way (I still haven't found it yet). however it is what it is. – pOrinG Jan 04 '18 at 08:16
  • Do you like to verify wether column names are valid (i.e. possible) if just matching your personal naming convention. For example this one is possible but would not fulfill your conditions: `CREATE TABLE MY_TABLE (C$ NUMBER, C# NUMBER, CÖÄÜ NUMBER);` – Wernfried Domscheit Jan 04 '18 at 09:10
  • @WernfriedDomscheit Yes, Would like to check using my personal list of allowed character's. Using special character's in column name just make things unnecessarily complex in this case for me. Hence I prefer to use `regexp` over `dbms_assert.simple_sql_name('VALID_NAME')`. – pOrinG Jan 04 '18 at 10:01

2 Answers2

5

This answer does not necessarily offer either a performance or logical improvement, but you can actually validate the column names using a single regex:

SELECT ...
WHERE NOT
    REGEXP_LIKE (COALESCE(Column_Name_String, ''), '^([a-zA-Z])[a-zA-Z0-9_]{0,29}$')

This works because:

  • It uses the same pattern to match columns, i.e. starting with a letter and afterwards using only alphanumeric characters and underscore
  • NULL column names are mapped to empty string, which fails the regex
  • We use a length quantifier {0,29} to check the column length directly in the regex
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
4

" is there any build in function which can do a more extensive check."

Oracle has the DBMS_ASSERT.SIMPLE_SQL_NAME() function. This returns the passed name if it meets the Oracle naming rules ...

 select dbms_assert.simple_sql_name('VALID_NAME') from dual;

... and hurls ORA-44003 if the name is invalid.

Valid names permit any characters if the name is double-quoted (yuck, but then so is creating "temp tables on-fly"). Also the function doesn't check the length of the name, so you will still need to validate that yourself.

Find out more in the docs.

Also here is a SQL Fiddle.


"creating a table with comment column is not possible as its a invalid identifier"

Fair point. DBMS_ASSERT is primarily aimed at preventing SQL injection. So it verifies that a value conforms to Oracle's naming rules, not that the value is a valid Oracle name. To catch things like comment you will also need to check the value against V$RESERVED_WORDS, probably where reserved != 'Y'. As this is a V$ view select on it is not granted by default; if you don't have access you'll need to ask your friendly DBA to help out.

" For validating column names I believe I should check with the entire list"

Up to you. The distinction is that some keywords can legitimately be used as identifiers. For instance TYPE only became a reserved word in Oracle version 8 when they introduced the object-relational stuff. But there were a lot of tables and views in existing systems which used 'TYPE' as a column name (not least the Oracle data dictionary). If Oracle had made TYPE a properly reserved word it would have broken all those systems. So the list of reserved words which cannot be used as identifiers is a sub-set of all the Oracle keywords.


Opinions on the general task:

"we are getting data from external sources (files) and the job of the program/script is to push that data to oracle tables."

There are two parts to this task.

The first is that you should have agreed a standard format for these files with the third parties. There should be no need for discovery of the files' structure or content. (Or if there is such a need because the files are randomly sourced from a carousel of third parties probably you should not be using a relational database but something else: Endeca? Python Pandas library?)

The second is the creating tables on the fly. If you have an agreed file structure then you should be loading into standard tables, using either SQL*Loader or external tables according to your circumstances. If you're on 12c maybe SQL*Loader Express Mode could be of interest.

APC
  • 144,005
  • 19
  • 170
  • 281
  • I personally hate double-quoted names as well. It makes writing SQL very difficult. Anyway this function doesn't help much because `select dbms_assert.simple_sql_name('comment') from dual` still returns success but creating a table with `comment` column is not possible as its a invalid identifier. – pOrinG Jan 04 '18 at 08:15
  • [I hate double-quotes too](https://stackoverflow.com/a/6030439/146325). – APC Jan 04 '18 at 08:29
  • For the task as a whole, we have agreed for a standard format with the third parties however there are so many of them that somewhere something always goes wrong in between. Its better to prepare then suffer. As the application is recently introduced it is expected that mistakes will occur but with time things will be streamlined & there won't be any need for these checks. – pOrinG Jan 04 '18 at 08:42
  • We are still on oracle 11g. I was also wondering about the `commit all or none option` in sqlldr which I was not able to find. Even if the process is streamlined & we are directly loading in standard tables we still need to entertain the thought of getting errors & handle them. With the absence of `commit all records or nothing` in sqlldr, it is impossible for us to achieve `direct load into standard tables` as it doesn't comply with business requirements. – pOrinG Jan 04 '18 at 08:46
  • The `V$RESERVED_WORDS` will solve all my problems. For validating column names I believe I should check with the entire list instead of just checking `reserved != 'Y'`. – pOrinG Jan 04 '18 at 08:56
  • "it doesn't comply with business requirements" . Sounds like you should investigate external tables. They give you the bad file and logging of SQL\*Loader without needing to insert into heap tables. – APC Jan 04 '18 at 09:11
  • As a rule you should aim to use as much of the Oracle built-in functionality as possible to reduce the amount of code you need to maintain. You may think you are building some temporary scaffolding which you will remove once the system is streamlined. But the lessons of history teach us this almost never happens. What you build now you will need to support for the lifetime of the system, so you should design your architecture accordingly. – APC Jan 04 '18 at 09:16
  • 1
    I agree and I have learnt the same from this experience. Noted. – pOrinG Jan 04 '18 at 09:30
  • In case of reserved depending on the situation a better check would be to use `select * from V$RESERVED_WORDS Where Reserved = 'Y' Or RES_SEMI = 'Y'` to figure out all identifier's which are not allowed as column_name. – pOrinG Jan 04 '18 at 09:40