-1

I have a DataFrame with this schema:

id      user        keywords
1       u1, u2      key1, key2  
1       u3, u4      key3, key4
1       u5, u6      key5, key6
2       u7, u8      key7, key8
2       u9, u10     key9, key10
3       u11, u12    key11, key12
3       u13, u14    key13, key14

I need a method to group Rows by id and concatenate the strings in user and keywords columns, to make it look like this:

id      user                            keywords
1       u1, u2, u3, u4, u5, u6          key1, key2, key3, key4, key5, key6
2       u7, u8, u9, u10                 key7, key8, key9, key10
3       u11, u12, u13, u14              key11, key12, key13, key14

How do I do that in Java?

Sparkan
  • 139
  • 1
  • 13
  • What have you tried to do? In this site you should ask for answers to problem you encounter, not a solution for a work to be done... – mgaido Jun 01 '16 at 09:28
  • I've been trying to work on JavaRDD, convert it to JavaPairRDD and apply ReduceByKey and Aggregation, but with no success. I thought there might be a better solution that could be applied directly on a dataframe, which i don't know how. – Sparkan Jun 01 '16 at 09:31
  • I am not sure. I am having troubles understanding that suggested solution in python. It seems the UserDefinedAggregateFunction does not exist in spark 1.6.1 in Java. – Sparkan Jun 01 '16 at 09:40
  • Python? There is no Python there. – zero323 Jun 01 '16 at 09:48

1 Answers1

0

Do something like:

  1. create an RDD with (user, (authors, keywords)
  2. groupByKey on this RDD
  3. to some flatMap on authors and keywords
Benjamin
  • 3,350
  • 4
  • 24
  • 49