I have a list of booleans
unique_df1 = [True, True, False .... ,False, True]
I have a pyspark dataframe, df1:
type(df1) = pyspark.sql.dataframe.DataFrame
The lengths are compatible:
len(unique_df1) == df1.count()
How do I create a new dataframe, using unique_df1, to choose which rows will be in the new dataframe?
To do this with a pandas data frame:
import pandas as pd
lst = ['Geeks', 'For', 'Geeks', 'is',
'portal', 'for', 'Geeks']
df1 = pd.DataFrame(lst)
unique_df1 = [True, False] * 3 + [True]
new_df = df1[unique_df1]
I can't find the similar syntax for a pyspark.sql.dataframe.DataFrame. I have tried with too many code snippets to count. How do I do this in pyspark?