How to select or drop a designated row in pyspark dataframe? such as drop third row in dataframe
Asked
Active
Viewed 97 times
1 Answers
0
You can use where
or filter
functions to achieve this as shown below:-
df.filter($"age" > 15)
df.where($"age" > 15)
Update to drop by column index
val col = df.columns
val n = df.columns.length
val toBeDropped = n-1 // to drop last column and so on..
val oldDf = df.drop(col(ToBeDropped ))

Community
- 1
- 1

Jayadeep Jayaraman
- 2,747
- 3
- 15
- 26
-
-
I have updated the answer, if this was helpful pls do accept the answer. – Jayadeep Jayaraman Nov 18 '19 at 07:49
-
That is not possible without getting all the rows in the driver which will lead OOM errors if the data is large. The best way to do would be to use the `filter` statement and programmatically determining the condition – Jayadeep Jayaraman Nov 18 '19 at 08:13