-2

I have a table which looks like this:

Current Table: 'original_table'

   col_1          col_2             col_3
0  Non-Saved      www.google.com    10
1  Non-Saved      www.facebook.com  20
2  Non-Saved      www.linkedin.com  20
3  Saved          www.Quora.com     30
4  Saved          www.gmail.com     40

Can I derive a table such as the following using SQL query?

   col_1          col_2             col_3
0  Non-Saved      www.google.com    50
                  www.facebook.com
                  www.linkedin.com
1  Saved          www.Quora.com     70
                  www.gmail.com    

Basically I am expecting a table to have DISTINCT values from col_1, all corresponding values from col_2 and SUM(col_3) for corresponding values from col_3 in one row.

Additional context: I am running the SQL query in iPython notebook.

Preetesh Gaitonde
  • 449
  • 1
  • 9
  • 18
  • Hint: You can use [group_concat](https://sqlite.org/lang_aggfunc.html#groupconcat) – Giorgos Betsos Oct 11 '17 at 05:54
  • 1
    Consult a simple sql tutorial and you will have your answer. – fancyPants Oct 11 '17 at 06:17
  • Sorry! I should have given additional context in my question (just added). I am running my query in iPython notebook. Tried using group_concat it gives me the following error: `AnalysisException: u"Undefined function: 'GROUP_CONCAT'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.` – Preetesh Gaitonde Oct 11 '17 at 06:22
  • 1
    a sql query will bring back a number of roes. what should the values of col_1 and col_3 be in your example above? should they be null? or perhaps col_2 should be a concatenation of values, and only 2 rows returned? – Jon Scott Oct 11 '17 at 06:55
  • @JonScott: As I mentioned in the expected table..I am trying to get DiSTINCT values from the 'original_table' col_1 to the expected table col_1. col_3 would be the SUM(col_3) for those corresponding values in col_1 and col_2 would be just the concat of all the corresponding values from col_2. Yes, Expecting it to return me 2 rows – Preetesh Gaitonde Oct 11 '17 at 07:00
  • 1
    Do you have any column to use for order by? – Zohar Peled Oct 11 '17 at 07:01
  • @ZoharPeled: Can I 'ORDER BY col_3 DESC'? – Preetesh Gaitonde Oct 11 '17 at 07:03
  • 1
    Not if you want the result to be the same as you have shown. – Zohar Peled Oct 11 '17 at 07:07
  • @ZoharPeled: Do I have to ORDER BY? It can take the default. – Preetesh Gaitonde Oct 11 '17 at 07:09
  • 2
    There is no default order. database table are unsorted by nature. Without using an order by, no relational database can guarantee the order of the results it returns. – Zohar Peled Oct 11 '17 at 07:11
  • Please read & act on [mcve]. Also before you ask read up more & google more re topics, keywords & error messages. Also you are not clear: do you want a table value that blanks out certain otherwise consecutive values in certain columns, and what are you saying about "one row"? Note that adding "basically" does not magically make things clear, and such use of "basically" basically means "not". Also please clarify in response to comments by editing your question, not via commenting. – philipxy Oct 11 '17 at 19:57
  • @philipxy: Thank your for your feedback. Super helpful. I'll make sure I follow these moving forward. – Preetesh Gaitonde Oct 12 '17 at 10:01
  • Hi. I hope that includes this post because right now the page--question, comments, answer--it shouldn't have been answered when so unclear--is a mess & not useful. You maybe want the effect of a certain group & order & the SparkSQL equivalent of group_concat, but your question doesn't say that or what you understand & have tried, and moreover from what you seem to have tried you could simply have googled 'spark sql group_concat'. – philipxy Oct 12 '17 at 18:09

1 Answers1

0

This question was originally tagged with sqllite as the rdbms the sqlite SQL is

select col_1,group_concat(col_2) as col_2,sum(col_3) as col_3
from yourtable
group by col_1
order by sum(col_3) desc;

Now - we know this is spark_sql which does not naively support this type of thing. Please see here for an answer

Jon Scott
  • 4,144
  • 17
  • 29