0

I have two types of URL's which I would need to clean, they look like this:

["//xxx.com/se/something?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"]
["//www.xxx.com/se/car?p_color_car=White?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"]

The outcome I want is; SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"


I want to remove the brackets and everything up to SE, the URLS differ so I want to remove:

First URL
["//xxx.com/se/something?

Second URL:
["//www.xxx.com/se/car?p_color_car=White?


I can't get my head around it,I've tried this .*\/ .
But it will still keep strings I don't want such as:
(1 url) = something?

(2 url) car?p_color_car=White?

user3052850
  • 178
  • 3
  • 17

1 Answers1

1

You can use

regexp_replace(FinalUrls, r'.*\?|"\]$', '')

See the regex demo

Details

  • .*\? - any zero or more chars other than line breakchars, as many as possible and then ? char
  • | - or
  • "\]$ - a "] substring at the end of the string.

Mind the regexp_replace syntax, you can't omit the replacement argument, see reference:

REGEXP_REPLACE(value, regexp, replacement)

Returns a STRING where all substrings of value that match regular expression regexp are replaced with replacement.

You can use backslashed-escaped digits (\1 to \9) within the replacement argument to insert text matching the corresponding parenthesized group in the regexp pattern. Use \0 to refer to the entire matching text.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • I had another question for an additional problem I'm facing with the URL. I have two URLS which can either have a "?" or "&". And I would need to have an "OR" condition in the following. I tried with 'regexp_replace(Url, r'.*\&|\?|"\]$',"")' But it doesn't effect when there's a "?". Ex. of URL ["//xxx.com/se/something?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"] ["//xxx.com/se/something&SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"] – user3052850 Oct 07 '20 at 08:25
  • 1
    @user3052850 Try `.*[?&]|"\]$`, see [demo](https://regex101.com/r/fXVLpO/2). – Wiktor Stribiżew Oct 07 '20 at 08:26