pyspark dataframe using group to get multiple fields count

Question

I have the region wise source of customer data like below

region,source,consumer_id
APAC,mail,1
APAC,referral,2
APAC,mail,3
APAC,referral,5
APAC,mail,6
APAC,referral,7
APAC,referral,8
US East,mail,9
US East,referral,10
US East,walkIn,11
AUS,walkIn,12
AUS,referral,13

Can someone help to get the region wise source count like below using pyspark dataframes.

region,mail_source_cnt, referral_source_cnt, walkIn_source_cnt
APAC,3,4,0
US EAST,1,1,1
AUS,0,1,1

Thanks for help

score 2 · Answer 1 · answered Apr 05 '18 at 13:21

You can aggregate to get count and pivot the columns as,

>>> from pyspark.sql import functions as F
>>> df.show()
+-------+--------+-----------+
| region|  source|consumer_id|
+-------+--------+-----------+
|   APAC|    mail|          1|
|   APAC|referral|          2|
|   APAC|    mail|          3|
|   APAC|referral|          5|
|   APAC|    mail|          6|
|   APAC|referral|          7|
|   APAC|referral|          8|
|US East|    mail|          9|
|US East|referral|         10|
|US East|  walkIn|         11|
|    AUS|  walkIn|         12|
|    AUS|referral|         13|
+-------+--------+-----------+
>>> df1 = df.groupby('region','source').count()
>>> df1.show()
+-------+--------+-----+
| region|  source|count|
+-------+--------+-----+
|US East|  walkIn|    1|
|    AUS|  walkIn|    1|
|   APAC|    mail|    3|
|   APAC|referral|    4|
|US East|    mail|    1|
|US East|referral|    1|
|    AUS|referral|    1|
+-------+--------+-----+
>>> df2 = df1.withColumn('ccol',F.concat(df1['source'],F.lit('_cnt'))).groupby('region').pivot('ccol').agg(F.first('count')).fillna(0)
>>> df2.show()
+-------+--------+------------+----------+
| region|mail_cnt|referral_cnt|walkIn_cnt|
+-------+--------+------------+----------+
|    AUS|       0|           1|         1|
|   APAC|       3|           4|         0|
|US East|       1|           1|         1|
+-------+--------+------------+----------+

pyspark dataframe using group to get multiple fields count

1 Answers1

Linked