I'm trying to trim the left and right white spaces in any given DataFrame, but only in string columns (so as to not alter the schema of the DataFrame). Another solution would be to trim all columns, and infer the schema or replace the schema after trimming. But I'm not sure how to do that either... this is what I'm doing now.
from pyspark.sql.functions import col
mmDF.printSchema()
columnList = [item[0] for item in mmDF.dtypes if item[1].startswith('string')]
mmDF = mmDF.withColumn(col, func.ltrim(func.rtrim(mmDF[col] for mmDF_col in columnList)))
mmDF.show()
mmDF.printSchema()
Trimming line causes error:
TypeError: Invalid argument, not a string or column: <generator object <genexpr> at 0x0000027D5C63E248> of type <class 'generator'>. For column literals, use 'lit', 'array', 'struct' or 'create_map' function.