1

I have 2 columns in below format:

+-----------------------------------------+-----------------------------------+
|Values                                   |Flags                              |
+-----------------------------------------+-----------------------------------+
|0x006c,0x0072,0x0074,0x0099,0x009a,0x009f|0x00,0x00,0x00,0x00,0x00,0x01      |
|0x009a,0x00a3,0x009f,0x0099,0x00a5,0x00a7|0x00,0x00,0x01,0x00,0x00,0x00      |

Now I need to parse the Flags column, find out which position has flag set to 1 and return that corresponding value. In this case 0x009f for both the rows. How to do it efficiently using SQL commands? One option could be to convert the 2 columns into 2 tables and then join them by the column number? Anything better? I am not familiar with SQL much. Thanks.

Nikhil Utane
  • 1,141
  • 2
  • 12
  • 29
  • check `explode` function in spark – undefined_variable Jun 19 '17 at 05:59
  • explode function needs an array while I have a string. How to convert the values from string to array? Thanks. – Nikhil Utane Jun 21 '17 at 12:46
  • `explode(split(str, ','))`.... where str is your column and `,` is separator – undefined_variable Jun 21 '17 at 13:20
  • Thanks. I could get it to work on scala (albeit with a small change, " instead of '. For e.g. `val flattened = test.withColumn("b", explode(split($"b", ","))))`. However I need it for pyspark and this is not working. I am trying with `"df.select(explode(split(col("list"), ",")).alias("word")).show()"`. In fact I get the same error when I try the example given in [Explode in PySpark](https://stackoverflow.com/questions/38210507/explode-in-pyspark). Error is `TypeError: 'Column' object is not callable` – Nikhil Utane Jun 21 '17 at 14:17
  • BTW, how can I explode with 2 columns? – Nikhil Utane Jun 22 '17 at 03:54
  • if you need 2 different columns then it's easy... if you need records from both column in single column you can use `concat` – undefined_variable Jun 22 '17 at 05:07
  • I have actually got it working for 2 columns but it is in scala (Thanks to Stack Overflow). Wanted to know how to do it in python. Here is the scala code. `val zip = udf((xs: Seq[String], ys: Seq[String]) => xs.zip(ys)) test2.withColumn("vars", explode(zip(split($"b", ","), split($"c", ",")))).select($"a",$"vars._1".alias("varA"), $"vars._2".alias("varB"))` – Nikhil Utane Jun 22 '17 at 06:03

0 Answers0