How to strip random Chars at the end of a String with Regex / Strip() in Python?

Question

What is the preferred way to cut off random characters at the end of a string in Python?

I am trying to simplify a list of URLs to do some analysis and therefore need to cut-off everything that comes after the file extension .php

Since the characters that follow after .php are different for each URL using strip() doesn't work. I thought about regex and substring(). But what would be the most efficient way to solve this task?

Example:

Let's say I have the following URLs:

example.com/index.php?random_var=random-19wdwka
example.org/index.php?another_var=random-2js9m2msl

And I want the output to be:

example.com/index.php
example.org/index.php

Thanks for your advice!

Possible duplicate of [Stripping everything but alphanumeric chars from a string in Python](https://stackoverflow.com/q/1276764/1278112) — Shihe Zhang, Nov 01 '17 at 02:55

score 1 · Accepted Answer · answered Jul 25 '17 at 07:50

1

There are two ways to accomplish what you want.

If you know how the string ends:

In your example, if You know that the string ends with .php? then all you need to do is:

my_string.split('?')[0]

If you don't know how the string ends:

In this case you can use urlparse and take everything but the parameters.

from urlparse import urlparse

for url is urls:
    p = urlparse(url)
    print p.scheme + p.netloc + p.path

answered Jul 25 '17 at 07:50

bergerg

985
9
23

Thanks for your in-depth answer. I know how the string ends, so the first approach works fine, but thanks for the urlparse example! – Alexander Scherer Jul 25 '17 at 08:28

score 0 · Answer 2 · answered Jul 25 '17 at 07:40

0

for url in urls:
    result = url.split('?')[0]
    print(result)

answered Jul 25 '17 at 07:40

Goolishka

210
1
11

score 0 · Answer 3 · answered Jul 25 '17 at 07:54

0

Split on your separator at most once, and take the first piece:

 text="example.com/index.php?random_var=random-19wdwka"
 sep="php"
 rest = text.split(sep)[0]+".php"
 print rest

answered Jul 25 '17 at 07:54

Rohit-Pandey

2,039
17
24

score 0 · Answer 4 · answered Jul 25 '17 at 07:56

It seems like what you really want are to strip away the parameters of the URL, you can also use

from urlparse import urlparse, urlunparse

urlunparse(urlparse(url)[:3] + ('', '', ''))

to replace the params, query and fragment parts of the URL with empty strings and generate a new one.

How to strip random Chars at the end of a String with Regex / Strip() in Python?

4 Answers4

If you know how the string ends:

If you don't know how the string ends: