-1

When I try to create a new column in my pyspark dataframe with the number of each row. Now I need to select only a range of rows and it just keep bringing the same error

The code i'm using is:

from pyspark.sql.window import Window
w = Window().partitionBy(lit('a')).orderBy(lit('a'))
df1 = my_dataframe.withColumn("row_num", row_number().over(w))

display(df1.filter(col("row_num").between(1,40)))

And the error is:

TypeError: 'int' object is not callable

The snippet I'm using can be found here in this stackoverflow post.

Thanks!

Marcos Dias
  • 420
  • 3
  • 9

1 Answers1

0
from pyspark.sql.functions import lit,row_number,col
from pyspark.sql.window import Window

my_dataframe = spark.createDataFrame([('a',),('b',),('c',),('d',),('e',)],'item : string')

w = Window().partitionBy(lit('a')).orderBy(lit('a'))
df1 = my_dataframe.withColumn("row_num", row_number().over(w))

df1.show()

# +----+-------+
# |item|row_num|
# +----+-------+
# |   a|      1|
# |   b|      2|
# |   c|      3|
# |   d|      4|
# |   e|      5|
# +----+-------+

df1.filter(col('row_num').between(2,4)).show()

+----+-------+
# |item|row_num|
# +----+-------+
# |   b|      2|
# |   c|      3|
# |   d|      4|
# +----+-------+
Luiz Viola
  • 2,143
  • 1
  • 11
  • 30