-2

I have a pyspark dataframe which has one Column with vector values and one column with constant numerical values. Say for example

A | B
1 | [2,4,5]
5 | [6,5,3] 

I want to multiple the vector column with the constant column. I’m trying to do this basically cause I have word wmbeddings in the B column and some weights in the A column. And my final purpose to get weighted embeddings.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • 1
    Please see [How to make good reproducible Apache Spark examples](https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-examples). Details such what exactly your data types are (list? numpy array?) do matter... – desertnaut Mar 02 '19 at 13:04
  • @desertnaut I have the same question: how would you do if B was list of float? – Urian Jan 08 '20 at 10:18
  • @Urian if the answers below do not resolve your issue, please open a new question – desertnaut Jan 08 '20 at 10:36
  • For solution without breeze and in pyspark, I have open a new question [here](https://stackoverflow.com/questions/59645130/multiply-two-pyspark-dataframe-columns-with-different-types-arraydouble-vs-do) – Urian Jan 08 '20 at 11:59

2 Answers2

1

If your vector data is stored as an array of doubles, you can do this:

import breeze.linalg.{Vector => BV}

val data = spark.createDataset(Seq(
    (1, Array[Double](2, 4, 5)),
    (5, Array[Double](6, 5, 3))
  )).toDF("A", "B")

data.as[(Long, Array[Double])].map(r => {
  (BV(r._2) * r._1.toDouble).toArray
}).show()

Which becomes

+------------------+
|             value|
+------------------+
|   [2.0, 4.0, 5.0]|
|[30.0, 25.0, 15.0]|
+------------------+
Travis Hegner
  • 2,465
  • 1
  • 12
  • 11
0

Spark 2.4 onwards, you can use the higher order functions available in sql.

scala> val df = Seq((1,Seq(2,4,5)),(5,Seq(6,5,3))).toDF("a","b")
df: org.apache.spark.sql.DataFrame = [a: int, b: array<int>]

scala> df.createOrReplaceTempView("ashima")

scala> spark.sql(""" select a, b, transform(b, x -> x * a) as result from ashima """).show(false)
+---+---------+------------+
|a  |b        |result      |
+---+---------+------------+
|1  |[2, 4, 5]|[2, 4, 5]   |
|5  |[6, 5, 3]|[30, 25, 15]|
+---+---------+------------+


scala>
stack0114106
  • 8,534
  • 3
  • 13
  • 38