I have the following df and I want to filter out all the columns which contain https
df = spark.createDataFrame([
('https:john', 'john', 1.1, 'httpsasd'),
('https:john', 'john', 1.2, 'httpsasd')
], ['website', 'name', 'value', 'other']
)
I have found an answer which does not address the str inside the column as the filter: PySpark drop columns based on column names / String condition
What I am looking for is an output as follows:
name | value
--------------------
john | 1.1
pete | 1.2