1

In snowsight within snowflake, you can profile tables and see the % of null values in the UI, but is there an easy way to query for this data or export it from the UI? I just need to create a new table off a table with 1k+, but exclude columns that only have null vaules?

Input:

CREATE OR REPLACE TABLE TAB(Q INT, X INT, Y INT, Z INT)
AS
SELECT NULL, 0041X00002OEkZ5QAL,2, NULL UNION ALL
SELECT NULL, NULL, NULL, NULL UNION ALL
SELECT NULL, NULL, NULL, NULL UNION ALL
SELECT NULL, NULL, 2, NULL;
0004
  • 1,156
  • 1
  • 14
  • 49

1 Answers1

1

Using Snowpark to remove all-nullable columns:

import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import col, count

def main(session: snowpark.Session): 
    # input
    df = session.table('PUBLIC.TAB')

    # columns with all nulls
    dropcols = [c for c in df.columns if df.agg(count(col(c))).collect()[0][0]==0]

    # save as new table without nulls columns
    df.drop(dropcols).write.save_as_table('PUBLIC.TAB_WITHOUT_NULL',mode='overwrite')
            
    return 'Success'

For input:

CREATE OR REPLACE TABLE TAB(Q INT, X INT, Y INT, Z INT)
AS
SELECT NULL, 1,2, NULL UNION ALL
SELECT NULL, NULL, NULL, NULL UNION ALL
SELECT NULL, 3, NULL, NULL UNION ALL
SELECT NULL, NULL, 2, NULL;

SELECT * FROM TAB;
/*
Q   X   Y   Z
    1   2   
        
    3       
        2   
*/

Output:

SELECT * FROM TAB_WITHOUT_NULL;
/*
X   Y
1   2
    
3   
    2
*/

Related:

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
  • any chance I can have the code written directly in snowflake ui, don't want to have to accessing the api for secruity, this is a quick analysis as well. ie the python udf or javascript? Or just plain sql? – 0004 Apr 18 '23 at 20:29
  • @0004 Yes, you could easily Snowsight UI to run it directly [Snowflake - invoking Python code](https://stackoverflow.com/a/75899802/5070879) and if you parametrize src/target table name and hit deploy button it will create a Python stored procedure – Lukasz Szozda Apr 19 '23 at 06:17
  • The code you shared removes columns that have values, any idea why? For instance, it removed an id column that had values such as 0041X00002OEkZ5QAL – 0004 Apr 19 '23 at 17:14
  • @0004 Please share a sample data in form of `CREATE TABLE` and `INSERT INTO` to replicate this scenario- I cannot reproduce it - https://i.stack.imgur.com/NUkpF.png. The code is performing COUNT(col) and if all values are NULL in that column it returns 0, therefore indicating that column should be skipped from the dataframe. – Lukasz Szozda Apr 19 '23 at 17:43
  • the table has 300 columns, so it was hard -- but I made an edit to yours – 0004 Apr 19 '23 at 17:44
  • @0004 Please provide data type as well as the value "0041X00002OEkZ5QAL" is most likely not an INT – Lukasz Szozda Apr 19 '23 at 17:47
  • it is a VARCHAR(16777216) Each column in the table will vary in any of the data types available outside of variant – 0004 Apr 19 '23 at 17:53
  • @0004 I have tried the example you provided and the column was NOT removed: https://i.stack.imgur.com/fqmLZ.png - As a debug I suggest to comment out "# save as new table without nulls columns" part , `return dropcols` array and manually compare against your table. – Lukasz Szozda Apr 19 '23 at 17:58
  • I will look back into, thank you – 0004 Apr 19 '23 at 18:02
  • 1
    Yea, it worked you are right thx again! Got another one if your looking for points! https://stackoverflow.com/questions/76057823/calculating-of-column-values-that-are-null-in-snowsight-pands-getting-error – 0004 Apr 19 '23 at 18:36