1

I am able to go through the loop and print the right result but when I try to download I am unable to download the same data to a text file. I know I am missing something very simple or maybe I am making a mistake in integration of pandas library. If anyone can help out then that would be great.

from time import sleep
import pandas as pd

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

term = driver.find_elements_by_xpath("//tbody/tr")

for row in term:
    columns = row.find_elements_by_xpath("./td")
    termres = []
    for i in columns:
        termres.append(i.text)
    if len(termres) == 7:
        print(termres[0] + '\t' + termres[1] + '\t' + termres[2] + '\t' + termres[3] + '\t' + termres[4])
    elif len(termres) == 10:
        print(termres[0] + '\t' + termres[1] + '\t' + termres[3] + '\t' + termres[4] + '\t' + termres[5])
    elif len(termres) == 1 and termres[0] == 'Unofficial Transcript':
        print('-')
    elif len(termres) == 6 and termres[0].isalpha():
        print(termres[0] + '\t' + termres[1] + '\t' + termres[3] + '\t' + termres[4] )
    #"""
df = pd.DataFrame({'':term})
df.to_csv('term.txt', index= False)
print('downloaded')

The output from print statements is a huge list so I put part of the output as a sample:

CHEM    101     General Chemistry I     TR      3.000
CHEM    103     General Chemistry Lab I TR      1.000
CHEM    151     General Chemistry I     TR      3.000
CHEM    153     General Chemistry I Laboratory  TR      1.000
-
-

Then this is what gets downloaded to text file:

df = pd.DataFrame({'':term})
df.to_csv('term.txt', index= False)
print('downloaded')

#result from the above code in a text file.

"<selenium.webdriver.remote.webelement.WebElement (session=""0c17f0126422f2144127b971ad19e1f6"", element=""65e1e68a-e2c3-48e9-a89d-72c7c1d2811e"")>"
"<selenium.webdriver.remote.webelement.WebElement (session=""0c17f0126422f2144127b971ad19e1f6"", element=""d6f0374b-fe70-4499-abb8-75275f92fc59"")>"

So the question is how do I get the text and not the webelement object?

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Hamzah
  • 47
  • 9
  • 1
    Can you add your imports so it can be reproduced? – le_camerone Apr 20 '22 at 06:01
  • I added imports in the code sample above – Hamzah Apr 20 '22 at 09:01
  • could you check if this statement is right in your code df = pd.DataFrame({'':term}) ? – Pallavi Apr 20 '22 at 06:31
  • correct in what sense? It does download data to text file but it is the in form of "" but I need to retrieve text as showed in the print statement. – Hamzah Apr 20 '22 at 09:02
  • 1
    the definition of term is a collection of web elements as per the code you shared term terms = driver.find_elements_by_xpath("//tbody/tr") and you are adding the same to the dataframe, is what i believe is causing the error. so asked you to recheck if this action is correct or not. – Pallavi Apr 20 '22 at 11:51
  • I think that is exactly what is happening that it is printing out collections of web elements in the text file that is being downloaded. Do you know how to fix this issue? – Hamzah Apr 20 '22 at 12:04

1 Answers1

0

As per the line of code:

term = driver.find_elements_by_xpath("//tbody/tr")

term is a list of <tr> elements and each WebElement is represented as:

<selenium.webdriver.remote.webelement.WebElement (session="d4f20fd17bf4037ed8cf50b00e844a7f", element="f12cf837-6c77-4c90-9da2-7b5fb9da9e5d")>

Moving ahead in your program though you have traversed down to the descendents of the <tr> elements and even printed the desired texts but while constructing the dataframe instead of considering the desired texts, you have considered the <tr> stored in the term list, which was a list of WebElements.

Hence, the same WebElements are written within the textfile term.txt.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352