0

I am trying to replaces a regex (in this case a space with a number) with

I have a Spark dataframe that contains a string column. I want to replace a regex (space plus a number) with a comma without losing the number. I have tried both of these with no luck:

df.select("A", f.regexp_replace(f.col("A"), "\s+[0-9]", ' , ').alias("replaced"))

df.select("A", f.regexp_replace(f.col("A"), "\s+[0-9]", '\s+[0-9] , ').alias("replaced"))

Any help is appreciated.

user1624577
  • 547
  • 2
  • 6
  • 15
  • 1
    Some example inputs/outputs would be helpful. [How to create good reproducible apache spark dataframe examples](https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-dataframe-examples). – pault Aug 17 '18 at 13:42
  • Can you elaborate the example with data. What is the actual value and what do u want to derive. – Manu Gupta Aug 17 '18 at 18:23
  • Possible duplicate of [PySpark - String matching to create new column](https://stackoverflow.com/questions/46410887/pyspark-string-matching-to-create-new-column) – Luis A.G. Mar 20 '19 at 15:17

1 Answers1

0

What you need is another function, regex_extract

So, you have to divide the regex and get the part you need. It could be something like this:

df.select("A", f.regexp_extract(f.col("A"), "(\s+)([0-9])", 2).alias("replaced"))
Luis A.G.
  • 1,017
  • 2
  • 15
  • 23