I'm trying to clean a table scraped from a website.
I have two questions:
- I'm not sure why my code below is producing a list of lists instead of just one list
- I'm scraping each column into an individual list and then converting them into a dataframe. Is it a good practice to do the data cleaning in the list or do I do the cleaning after they're converted into a dataframe?
doc_name = driver.find_elements(By.XPATH, "//*[@id='docflow.list_DocFlowList']/tbody/tr/td/table/tbody/tr/td[3]")
doc_name_cleaned = [re.findall(r'\d+',i.text) for i in doc_name]