pyspark replace regex with regex

Question

I am trying to replaces a regex (in this case a space with a number) with

I have a Spark dataframe that contains a string column. I want to replace a regex (space plus a number) with a comma without losing the number. I have tried both of these with no luck:

df.select("A", f.regexp_replace(f.col("A"), "\s+[0-9]", ' , ').alias("replaced"))

df.select("A", f.regexp_replace(f.col("A"), "\s+[0-9]", '\s+[0-9] , ').alias("replaced"))

Any help is appreciated.

Some example inputs/outputs would be helpful. [How to create good reproducible apache spark dataframe examples](https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-dataframe-examples). — pault, Aug 17 '18 at 13:42
Can you elaborate the example with data. What is the actual value and what do u want to derive. — Manu Gupta, Aug 17 '18 at 18:23
Possible duplicate of [PySpark - String matching to create new column](https://stackoverflow.com/questions/46410887/pyspark-string-matching-to-create-new-column) — Luis A.G., Mar 20 '19 at 15:17

score 0 · Answer 1 · answered Mar 20 '19 at 15:17

0

What you need is another function, regex_extract

So, you have to divide the regex and get the part you need. It could be something like this:

df.select("A", f.regexp_extract(f.col("A"), "(\s+)([0-9])", 2).alias("replaced"))

answered Mar 20 '19 at 15:17

Luis A.G.

1,017
2
15
23

pyspark replace regex with regex

1 Answers1