Removing all characters after URL?

Question

Basically, I'm trying to remove all the characters after the URL extension in a URL, but it's proving difficult. The application works off a list of various URLs with various extensions.

Here's my source:

import requests
from bs4 import BeautifulSoup
from time import sleep

#takes userinput for path of panels they want tested
import_file_path = input('Enter the path of the websites to be tested: ')

#takes userinput for path of exported file
export_file_path = input('Enter the path of where we should export the  panels to: ')

#reads imported panels
with open(import_file_path, 'r') as panels:
    panel_list = []
    for line in panels:
        panel_list.append(line)

x = 0

for panel in panel_list:
    url = requests.get(panel)
    soup = BeautifulSoup(url.content, "html.parser")
    forms = soup.find_all("form")
    action = soup.find('form').get('action')

    values = { 
    soup.find_all("input")[0].get("name") : "user",
    soup.find_all("input")[1].get("name") : "pass"
    }


    print(values)

    r = requests.post(action, data=values)
    print(r.headers)
    print(r.status_code)
    print(action)
    sleep(10)
    x += 1

What I'm trying to achieve is an application that automatically tests your username/password from a list of URLs provided in a text document. However, BeautifulSoup returns an incomplete URL when crawling for action tags, i.e instead of returning the full http://example.com/action.php it will return action.php as it would be in the code. The only way I can think to get past this would be to restate the 'action' variable as 'panel' with all characters after the url extension removed, followed by 'action'.

Thanks!

@Tony In the .txt file the login forms are linked to directly. I tried that before, but it returned http://example.com/login.php/action.php — pythonewbie, Aug 07 '17 at 18:18
Possible duplicate of [How to join absolute and relative urls?](https://stackoverflow.com/questions/8223939/how-to-join-absolute-and-relative-urls) — t.m.adam, Aug 07 '17 at 18:37
How about `r = request.post(urljoin(panel, action), data=values)`? — Oluwafemi Sule, Aug 09 '17 at 11:32

Removing all characters after URL?

0 Answers0