-2

How do you use WHERE keyword to get the count of gender and their percentage, who survived in titanic disaster?

my code:

spark.sql(
    "SELECT Sex Where Survived=1 ,count(Sex) \
    as gender_count,count(sex)*100/sum(count(sex)) over() \
    as percent from titanic_table GROUP BY sex"
).show()

error:

ParseException: "
mismatched input ',' expecting <EOF>(line 1, pos 28)
== SQL ==
SELECT Sex Where Survived=1 ,count(Sex) 
as gender_count,count(sex)*100/sum(count(sex)) over() 
as percent from titanic_table GROUP BY sex
----------------------------^^^
"
pault
  • 41,343
  • 15
  • 107
  • 149
  • 2
    You syntax is incorrect. You can't use `WHERE` in the `SELECT` expression like that. It has to come after the `FROM`. In this case, it looks like you also you need to use `GROUP BY`. – pault Feb 07 '19 at 19:18
  • may i know how the syntax would be like? – ravi sankar Feb 07 '19 at 19:22
  • Generally the form needs to be `SELECT , FROM table WHERE GROUP BY `. For a more specific explanation, please [edit] your question to include a small, self contained [reproducible example](https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-examples) with the desired output. – pault Feb 07 '19 at 19:31

1 Answers1

0

You should put it after FROM and before GROUP BY.

Your code should be :

spark.sql("SELECT Sex, count(Sex) AS gender_count, \
100*count(sex)/sum(count(sex)) over() AS percent \
FROM titanic_table \
WHERE Survived = 1 \
GROUP BY sex").show()
ldc
  • 306
  • 1
  • 14