-2

Below is the sample spark sql I wrote to get the count of male and female enrolled in an agency.I used sql to generate the output, Is there a way to do similar thing using dataframe only not sql.

val districtWiseGenderCountDF = hiveContext.sql("""
                                                   | SELECT District, 
                                                   |        count(CASE WHEN Gender='M' THEN 1 END) as male_count, 
                                                   |        count(CASE WHEN Gender='F' THEN 1 END) as FEMALE_count 
                                                   | FROM agency_enrollment 
                                                   | GROUP BY District
                                                   | ORDER BY male_count DESC, FEMALE_count DESC
                                                   | LIMIT 10""".stripMargin)

1 Answers1

0

Starting with Spark 1.6 you can use pivot + group by to achieve what you'd like

without sample data (and my own availability of spark>1.5) here's a solution that Should work (not tested)

val df = hiveContext.table("agency_enrollment")
df.groupBy("district","gender").pivot("gender").count

see How to pivot DataFrame? for a generic example

Community
  • 1
  • 1
James Tobin
  • 3,070
  • 19
  • 35