-2

I have downlaoded data from SAP and trying to do ETL. Data set looks like below.

11.780,00
13.824,00
0,00
33.024,00

I am trying to remove "dot" first and then replace "comma" with dot The following code makes the whole column as blank (everything just vanishes) It's just a simple regex replace statement sales = sales.withColumn('gross', regexp_replace('gross', '.', ''))

Again when I try as below sales = sales.withColumn('gross', regexp_replace('gross', '.', ':')) output looks like below :::::::::::::

How do I handle this conversion. It's bit weird. Thanks.

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
Lilly
  • 910
  • 17
  • 38
  • 2
    try `sales.withColumn('gross', regexp_replace('gross', '\.', ''))` .In regex `dot` = `any character except new line` – Rahul Raut Feb 24 '20 at 07:24

1 Answers1

1

As RahulRauts commented: the . is a special character when used in regex. It means "Any single character". You need to escape it by prepending a backslash in front if you mean a literal '.'

sales = sales.withColumn('gross', regexp_replace('gross', '\.', ''))

See https://docs.python.org/3.8/library/re.html:

The special characters are:

. (Dot.) In the default mode, this matches any character except a newline.

[...]

\ Either escapes special characters (permitting you to match characters like '*', '?', and so forth), or signals a special sequence; special sequences are discussed below.

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69