Regex on List Comprehension Not Producing List But List of Lists Instead

Question

I'm trying to clean a table scraped from a website.

I have two questions:

I'm not sure why my code below is producing a list of lists instead of just one list
I'm scraping each column into an individual list and then converting them into a dataframe. Is it a good practice to do the data cleaning in the list or do I do the cleaning after they're converted into a dataframe?

doc_name = driver.find_elements(By.XPATH, "//*[@id='docflow.list_DocFlowList']/tbody/tr/td/table/tbody/tr/td[3]")

doc_name_cleaned = [re.findall(r'\d+',i.text) for i in doc_name]

Thank you user2357112 for asking this pertinent question! I can't believe I didn't think of that. I must remember to think of your question each time I'm stuck at a code! — Nico, Oct 23 '20 at 05:25

score 2 · Accepted Answer · answered Oct 23 '20 at 05:13

2

doc_name_cleaned = [re.findall(r'\d+',i.text) for i in doc_name]

In the above line re.findall() function returns a list of matches(it can be more than one). Since you're matching pattern for a list of texts, the result is a list of lists.

You can try this, if you just want the text.

doc_name_cleaned = []
for i in doc_name:
    matches= re.findall(r'\d+',i.text)
    if matches:
        doc_name_cleaned.append(matches[0])
    else:
        doc_name_cleaned.append('')

answered Oct 23 '20 at 05:13

Amith Lakkakula

506
3
8

Thank you! I was considering this `for` loop option but was wondering if a list comprehension would be a good way to do what I wanted to do. Guess not. – Nico Oct 23 '20 at 14:48

Regex on List Comprehension Not Producing List But List of Lists Instead

1 Answers1