I'm a Python beginner. I created a script to login to a website and create a dataframe from a table. Eventually, this dataframe will be sent to a MS Excel workbook, but I need it to be sorted properly first.
My problem is with sorting the dataframe. I want to sort the dataframe in descending order by one of the columns. I'm getting a KeyError. For some reason, the column name is not being found.
Here is the code I'm using:
from bs4 import BeautifulSoup
import pandas as pd
from selenium import webdriver
DRIVER_PATH = '<PATH TO DRIVER>'
from selenium.webdriver import chrome
from selenium.webdriver.support.select import Select
driver = webdriver.Chrome(executable_path='<PATH TO DRIVER>')
#open Report Website
driver.get('<URL OF LOGIN PAGE')
#Login to Report Website
element = driver.find_element_by_id('txtUID')
element.click()
element.clear()
element.send_keys('<MY USERNAME>')
element = driver.find_element_by_id('txtPWD')
element.click()
element.clear()
element.send_keys('<MY PASSWORD>')
import time
time.sleep(2)
element = driver.find_element_by_id('Send')
element.click()
#Navigate to Report Page
driver.get('<URL TO REPORT PAGE>')
#create df
html = driver.page_source
soup = BeautifulSoup(html)
table = soup.find_all('table')[1]
df = pd.read_html(str(table))[0]
sorted_df = df.sort_values(by='Pct Of Widgets w/ Green Label')
In addition to my attempts to sort by the column name, I've also tried sorting by the column number using this code:
sorted_df = df.sort_values(by='8')
But I get the same KeyError. It seems like such a simple thing to do, I don't understand why the column references are not being found.
My specific question is - What is the easiest way to sort the dataframe in descending order by one of the columns?