0

I used the below code to calculate the average of an attribute

from pyspark.sql import functions as F

from pyspark.sql.functions import mean

result = df.select([mean("Age")])

result.show()

I got the output as 56.4567 i need to convert it into an integer

Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
Zac
  • 41
  • 5

3 Answers3

1

If you want the result as int and not df run

result = round(df.select(mean("Age")).collect()[0][0])

result will be of int type.

Shubham Jain
  • 5,327
  • 2
  • 15
  • 38
0
result_as_integer = int(result)

or

result_as_float = float(result)
Ish
  • 136
  • 11
  • I am getting an error "TypeError: int() argument must be a string, a bytes-like object or a number, not 'DataFrame'" – Zac May 28 '20 at 12:18
  • can't convert df to int directly – Shubham Jain May 28 '20 at 13:06
  • 1
    While this code may resolve the OP's issue, it is best to include an explanation as to how your code addresses the OP's issue. In this way, future visitors can learn from your post, and apply it to their own code. SO is not a coding service, but a resource for knowledge. Also, high quality, complete answers are more likely to be upvoted. These features, along with the requirement that all posts are self-contained, are some of the strengths of SO as a platform, that differentiates it from forums. You can edit to add additional info &/or to supplement your explanations with source documentation. – SherylHohman May 29 '20 at 02:14
-1

First you need to convert pyspark dataframe result to real number:

result = result.take(1)[0].asDict()['avg(Age)']

or

result = result.collect()[0]['avg(Age)']

or

result = result.collect()[0][0]

if you need the floor of the number:

import math
math.floor(float(result))

#56

if you need the ceiling of the number:

import math
math.ceil(float(result))

#57
Bugface
  • 313
  • 2
  • 7