2

I am attempting to grab a sample of data from every column in every table in about 5 schemas. Below is an example of my getting this data from just 1 schema (replace "sde" with whatever your schema is to run). This piece runs fine:

select CASE when 
lead(ROWNUM) over(order by ROWNUM) is null then
'select '||''''||T.TABLE_NAME||''''||' as TABLE_NAME,'||''''||T.COLUMN_NAME||''''||' as COLUMN_NAME, cast('|| T.COLUMN_NAME ||' as 
VarChar2(50)) as SAMPLE_DATA from sde.'||T.TABLE_NAME ||' where '||T.COLUMN_NAME||' is not null and ROWNUM=1;'  
else
'select '||''''||T.TABLE_NAME||''''||' as TABLE_NAME,'||''''||T.COLUMN_NAME||''''||' as COLUMN_NAME, cast('|| T.COLUMN_NAME ||' as 
VarChar2(50)) as SAMPLE_DATA from sde.'||T.TABLE_NAME ||' where '||T.COLUMN_NAME||' is not null and ROWNUM=1 union all' end as qry_txt
from all_tab_columns t where T.OWNER='SDE' and T.DATA_TYPE != 'BLOB' and T.DATA_TYPE != 'LONG'
ORDER BY ROWNUM asc, qry_txt asc

When the resulting set above is run, here is an example of 1 line of the output:

select 'HUD_TYPE' as TABLE_NAME,'HUD_TYPE_ID' as COLUMN_NAME, cast(HUD_TYPE_ID as VarChar2(50)) as SAMPLE_DATA from sde.HUD_TYPE where HUD_TYPE_ID is not null and ROWNUM=1 union all

The problem I'm facing is that when I run the complete set of unions, it never finishes, I've only been able to run a few 100 to a few 1000 lines at a time using:

select CASE when 
lead(ROWNUM) over(order by ROWNUM) is null then
'select '||''''||T.TABLE_NAME||''''||' as TABLE_NAME,'||''''||T.COLUMN_NAME||''''||' as COLUMN_NAME, cast('|| T.COLUMN_NAME ||' as 
VarChar2(50)) as SAMPLE_DATA from sde.'||T.TABLE_NAME ||' where '||T.COLUMN_NAME||' is not null and ROWNUM=1;'  
else
'select '||''''||T.TABLE_NAME||''''||' as TABLE_NAME,'||''''||T.COLUMN_NAME||''''||' as COLUMN_NAME, cast('|| T.COLUMN_NAME ||' as 
VarChar2(50)) as SAMPLE_DATA from sde.'||T.TABLE_NAME ||' where '||T.COLUMN_NAME||' is not null and ROWNUM=1 union all' end as qry_txt
from all_tab_columns t where T.OWNER='SDE' and T.DATA_TYPE != 'BLOB' and T.DATA_TYPE != 'LONG'
ORDER BY ROWNUM asc, qry_txt asc

OFFSET 4800 ROWS FETCH NEXT 400 ROWS ONLY; --Using this method so I grab the last few hundred lines so my case statement remains valid for demo

This particular schema is the smallest in the bunch with only 5,000 rows to return. Is what I'm trying to do an impossible task for an on the fly query? Or is there a way I can make this more efficient or break this up into loops to grab chunks in some way? Trying to avoid having to involve our devs and have tables created, ETLs, etc. I'm not a SQL expert, but if I'm pointed in the right direction, I can hack it out. :)

Thanks in advance.

Drivium
  • 537
  • 6
  • 24
  • What is the actual problem you are trying to solve? Perhaps sampling data from every column in every table in five schemas is not the only solution to your actual problem (and perhaps it's not even the best solution). –  Jul 19 '16 at 18:38
  • Sure. I created a customer-facing Tableau report that provides a quick way to search our available tables & columns. It includes columns such as Schema, Object Type (Table, View, etc.), Table Name, Column Name, Data Type, and Comments. I also wanted to include "Sample Data" as a column, which is a single row of data from each column in each table. I know HOW to do it, but the union set seems to hang with any normal size data set... Hoping for some optimization pointers or alternate methods to achieve the same. – Drivium Jul 19 '16 at 19:32

1 Answers1

1

Yes, this is probably too much to do in one query.

The query could be hopelessly slow because of parse bugs, recursive SQL, or empty space This task probably requires some development - either a difficult but accurate stored procedure, or a quick and inaccurate version that uses column statistics.


Why it might be slow

  1. Parse Time Oracle's compiler is usually fast but there are some weird cases where the parse time grows exponentially. Using a large number of UNION ALLs is one of those cases. Especially with older versions, like 10g, where more than 500 UNION ALLs would run forever. These problems affect similar methods, such as a multi-table insert, as I showed in this answer. So there's probably no simple way around this.

    This code shows how the query time increases exponentially with UNION ALL. This is the best possible case, using only the DUAL table.

    --Find time to execute simple statements.
    declare
        v_sql clob := 'select 1 a from dual';
        v_count number;
        v_time_before number;
        v_time_after number;
    begin
        for i in 1 .. 5000 loop
            v_sql := v_sql || ' union all select 1 a from dual';
            --Only execute every 100th iteration.
            if mod(i, 100) = 0 then
                v_time_before := dbms_utility.get_time;
                execute immediate 'select count(*) from ('||v_sql||')' into v_count;
                v_time_after := dbms_utility.get_time;
                dbms_output.put_line(i||':'||to_char(v_time_after-v_time_before));
            end if;
        end loop;
    end;
    /
    

    enter image description here

  2. Recursive SQL Sometimes the SQL used to prepare the SQL statement can be a problem. Usually the worst offender is dynamic sampling. Dynamic sampling generates queries to find a small amount of data to generate on-the-fly table statistics. Usually it's fast but sometimes those queries can have bad execution plans and be slow. You can disable it with a hint like /*+ dynamic_sampling(0) */.

    If you look at GV$SQL you can find some other recursive SQL that uses similar hints. You may want to copy some of those hints, such as /*+ no_parallel */.

  3. Empty Space Retrieving the first row is not necessarily a trivial thing. It's possible that a table previously had a terabyte of data, someone deleted 99.9% of it, and there's just one row sitting at the end of the segment. Oracle has to look through all that empty space to find that one row. There are ways to solve that, like re-organizing the table, but it's possible someone forgot to do that.

    Or possibly there are a billion rows but only one of them has a value for that column. And that column doesn't have an index, so the entire table has to be read.

Possible Solutions

  1. The Accurate, Painful Way The most accurate solution requires a stored procedure to search through smaller chunks of subqueries at a time. Possibly using parallelism for any empty space problems. You'll probably need to temporarily store the values in a table, and troubleshoot some slow queries along the way. A small chunk size will avoid parsing problems and will at least make it easier to find subqueries with other problems.

  2. The Inaccurate, Fast Way Oracle by default gathers optimizer statistics for every column in the database. (You'll want to check with the DBA to make sure they haven't disabled the defaults, which unfortunately many DBAs do.) With the default algorithms Oracle scans every row in every table and records the high and low value for each column.

    Those high and low values can be read almost instantly from the data dictionary. The problem is that the conversion to the original value is not completely accurate.

    The code below is from this article by Jonathan Lewis, and specifically from an anonymous comment.

    create or replace function raw_to_num(i_raw raw)
    return number
    as
        m_n number;
    begin
        dbms_stats.convert_raw_value(i_raw,m_n);
        return m_n;
    end;
    /  
    
    create or replace function raw_to_date(i_raw raw)
    return date
    as
        m_n date;
    begin
        dbms_stats.convert_raw_value(i_raw,m_n);
        return m_n;
    end;
    /  
    
    create or replace function raw_to_varchar2(i_raw raw)
    return varchar2
    as
        m_n varchar2(32767);
    begin
        dbms_stats.convert_raw_value(i_raw,m_n);
        return m_n;
    end;
    / 
    

    Query that returns results very quickly:

    select
            table_name,
            column_name,
            decode(data_type,
                    'VARCHAR2',to_char(raw_to_varchar2(low_value)),
                    'DATE',to_char(raw_to_date(low_value)),
                    'NUMBER',to_char(raw_to_num(low_value))
            ) low_value,
            decode(data_type,
                    'VARCHAR2',to_char(raw_to_varchar2(high_value)),
                    'DATE',to_char(raw_to_date(high_value)),
                    'NUMBER',to_char(raw_to_num(high_value))
            ) high_value
    from all_tab_columns t where T.OWNER='SDE' and T.DATA_TYPE != 'BLOB' and T.DATA_TYPE != 'LONG'
    order by 1,2
    
Community
  • 1
  • 1
Jon Heller
  • 34,999
  • 6
  • 74
  • 132
  • Thank you for the very thorough response. This is great information. Sounds unlikely that what I'm after is achievable with a simple query. It would appear that I do not have permission to run functions... I may have to admit defeat on this effort. – Drivium Jul 20 '16 at 20:58