Python Selenium Web Scrape embedded excel in XPATH to pandas frame convert logic need

Question

I have this python requirement that after login into a website using python selenium webdriver, in a particular XPath there is an embedded csv file I could download the csv file to a local folder using the below code.

content =driver.find_element_by_xpath('//*[@id=":n"]/div').click()

My requirement is to read this csv in python code and convert this as pandas dataframe directly, I tried few methods and it is not working,How this csv file can be directly converted as dataframe in python using the XPATH to use internal data processing.If this CSV is not downloaded and only can be converted as pandas using selenium method is also fine..

Error code 1 :

content1 =driver.find_element_by_xpath('//*[@id=":l"]/div').click()
content1 =pd.read_csv('content1.text')

Error code 2:

content1 =pd.read_csv('driver.find_element_by_xpath('//*[@id=":l"]/div').click()')

Note :I do not want to download and locate the file to convert it using pd_read_csv() method. Let the method be using selenium webdriver, I do not want to use requests, soup to convert a table to the data frame.

Also if the file is Excel(Xls or Xlsx) in the web embedded format how to make a dataframe.

Your support is very much appreciated. thank you!

UPDATE :Tried the below code as well still it is printing only "All/Selected to CSV File" nothing converted as pandas dataframe.

Any expert advice to resolve this issue.

content =driver.find_element_by_xpath('//*[@id=":l"]/div')
content = content.text.split('\n')
content = pd.DataFrame(content)
print(content)

This was printing "All/Selected to CSV File " Note the XPATH given is to select the CSV file.

Update2: Found the exact path from where the csv file is downloaded tried few codes still it is failing.How to read the csv without fail from this URL atleast. Note:due to security reason the URL is altered, Given here is only reference

url='https://el.hm.com/cg-in/export?EK=45002&FORMAT=2&nocache=1533623732089&TID=507686'
c=driver.get(url)
c=pd.read_csv(io.StringIO(c.decode('utf-8')))

Error :AttributeError: 'NoneType' object has no attribute 'decode'

`content1` is not a WebElement - you should remove that `.click()`. Try `content1 =driver.find_element_by_xpath('//*[@id=":l"]/div').text` — Andersson, Aug 01 '18 at 12:12
@Andersson the above code does not work , 1.It does not first select the "All/selected to CSV file" without .click() added to it 2. adding .text after the XPATH just gets "All/Selected to CSV file" not the original CSV content could not be got to store it in Pandas frame. What is the exact change needed that will work. — Marx Babu, Aug 03 '18 at 11:34
Are you aware that pandas `.read_csv()` can read a CSV directly from a URL? (This is also true for pandas `.read_excel()`.) The CSV element you're locating probably has an href attribute containing the URL for the CSV. — T. Ray, Aug 03 '18 at 11:59
@T.Ray .read_csv() needs URL this is inside the web without any href all we have only a class here
All/selected to CSV File — Marx Babu, Aug 03 '18 at 12:49

Alex Bochkarev · Answer 1 · 2023-01-05T16:21:26.007

After you click() on the element you need to wait for the download to succeed and then get the downloaded file name. This answer has code snippets for Chrome and Firefox drivers. Your code would look like this for Chrome driver:

def getDownLoadedFileName(waitTime):
    driver.execute_script("window.open()")
    # switch to new tab
    driver.switch_to.window(driver.window_handles[-1])
    # navigate to chrome downloads
    driver.get('chrome://downloads')
    # define the endTime
    endTime = time.time()+waitTime
    while True:
        try:
            # get downloaded percentage
            downloadPercentage = driver.execute_script(
                "return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('#progress').value")
            # check if downloadPercentage is 100 (otherwise the script will keep waiting)
            if downloadPercentage == 100:
                # return the file name once the download is completed
                return driver.execute_script("return document.querySelector('downloads-manager').shadowRoot.querySelector('#downloadsList downloads-item').shadowRoot.querySelector('div#content  #file-link').text")
        except:
            pass
        time.sleep(1)
        if time.time() > endTime:
            break

# Initialize the driver and open the web page
# from which you wish to download the file

# Click to download and wait for it to finish
content = driver.find_element_by_xpath('//*[@id=":n"]/div').click()
# Load the csv dataframe. Replace read_csv with read_excel if it's an excel file
df = pd.read_csv(getDownLoadedFileName(60))

To load Excel file as dataframe you can use pd.read_excel().

Python Selenium Web Scrape embedded excel in XPATH to pandas frame convert logic need

1 Answers1

Linked