Right now, I'm just converting the df from a Snowpark df to a Pandas one and then removing columns with all 0s:
df = df.to_pandas()
df.loc[:, (df != 0).any(axis=0)]
Any way to do this without first converting to pandas?
Right now, I'm just converting the df from a Snowpark df to a Pandas one and then removing columns with all 0s:
df = df.to_pandas()
df.loc[:, (df != 0).any(axis=0)]
Any way to do this without first converting to pandas?
It is possible to achieve this effect in Snowpark:
import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import col, sum, sql_expr
def main(session: snowpark.Session):
df = session.create_dataframe([[0, 1, 2, 0], [0, 5, 0, 0], [0, 9, 9, 0], [0, 7, 1, 0]], \
schema=["q", "x", "y", "z"])
return df
/*
Q X Y Z
0 1 2 0
0 5 0 0
0 9 9 0
0 7 1 0
*/
Using conditional aggregation to build column list that is later dropped:
import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import col, sum, sql_expr
def main(session: snowpark.Session):
df = session.create_dataframe([[0, 1, 2, 0], [0, 5, 0, 0], [0, 9, 9, 0], [0, 7, 1, 0]], \
schema=["q", "x", "y", "z"])
dropcols = []
for colname in df.columns:
if df.agg(sum(sql_expr("CASE WHEN {colname}=0 THEN 0 ELSE 1 END". \
format(colname=colname)))).collect()[0][0]==0:
dropcols.append(colname)
return df.drop(dropcols)
/*
X Y
1 2
5 0
9 9
7 1
*/
Output:
Compact version:
import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import col, sum, sql_expr
def main(session: snowpark.Session):
df = session.create_dataframe([[0, 1, 2, 0], [0, 5, 0, 0], [0, 9, 9, 0], [0, 7, 1, 0]], \
schema=["q", "x", "y", "z"])
dropcols = [c for c in df.columns if df.agg(sum(sql_expr("CASE WHEN {c}=0 THEN 0 ELSE 1 END". \
format(c=c)))).collect()[0][0]==0 ]
return df.drop(dropcols)