I have a spark dataframe that looks like this:
+----+------+-------------+
|user| level|value_pair |
+----+------+-------------+
| A | 25 |(23.52,25.12)|
| A | 6 |(0,0) |
| A | 2 |(11,12.12) |
| A | 32 |(17,16.12) |
| B | 22 |(19,57.12) |
| B | 42 |(10,3.2) |
| B | 43 |(32,21.0) |
| C | 33 |(12,0) |
| D | 32 |(265.21,19.2)|
| D | 62 |(57.12,50.12)|
| D | 32 |(75.12,57.12)|
| E | 63 |(0,0) |
+----+------+-------------+
How do I extract the values in the value_pair
column and add them to two new columns called value1
and value2
, using the comma as the separator.
+----+------+-------------+-------+
|user| level|value1 |value2 |
+----+------+-------------+-------+
| A | 25 |23.52 |25.12 |
| A | 6 |0 |0 |
| A | 2 |11 |12.12 |
| A | 32 |17 |16.12 |
| B | 22 |19 |57.12 |
| B | 42 |10 |3.2 |
| B | 43 |32 |21.0 |
| C | 33 |12 |0 |
| D | 32 |265.21 |19.2 |
| D | 62 |57.12 |50.12 |
| D | 32 |75.12 |57.12 |
| E | 63 |0 |0 |
+----+------+-------------+-------+
I know I can separate the values like so:
df = df.withColumn('value1', pyspark.sql.functions.split(df['value_pair'], ',')[0]
df = df.withColumn('value2', pyspark.sql.functions.split(df['value_pair'], ',')[1]
But how do I also get rid of the parantheses?