0

After days of research and hundreds of errors I've almost reached the goal of my code but still missing a little detail. Here I'm scraping a website for some information and extracting it into Excel. The problem I'm trying to overcome here is creating a new line for every carrier. Right now, the output is a single list and I cannot figure out how to break every carrier info string apart.

import csv
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options

c_list = ['282131','365370','551712'] 
headers = ['Name','Unsafe Driving','d1','Crash Indicator','Hours Of Service','d2','Vehicle Maintenance','d3','CS/Alcohol','d4', 'HazMat','Driver Fitness','d5']
options = Options()
options.add_argument('--headless')
options.add_argument('--disable-gpu') 
driver = webdriver.Chrome(executable_path = 'mypath/chromedriver.exe') 
a=[]
c=[]
for i in c_list:
    driver.get("https://ai.fmcsa.dot.gov/SMS")
    wait = WebDriverWait(driver, 20)
    wait.until(EC.element_to_be_clickable((By.XPATH, "//a[@title='Close']"))).click()
    wait.until(EC.element_to_be_clickable((By.XPATH, "(//input[@name='MCSearch'])[2]"))).send_keys(i)
    wait.until(EC.element_to_be_clickable((By.XPATH, "(//input[@name='search'])[2]"))).click()
    wait.until(EC.element_to_be_clickable((By.XPATH, "//*[@id='BASICs']/p[2]/a"))).click()
    carrier = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, '//*[@id="basicInfo"]/div/h3')))
    c = carrier.text
    tbl = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, "//tr[@class='valueRow sumData']")))
    tab = tbl.text.replace("\n",','.strip())
    tab = tab.split(",")
    a.append(c)
    for x in tab:
        a.append(x)  

with open('table.csv','w', encoding='utf8') as myFile:
     writer = csv.writer(myFile)
     writer.writerow(headers)
     writer.writerow(a)

Right now, the output looks like this output

but what I need is this: desiredOutput

alexb
  • 57
  • 5
  • Please provide a [mre] including sample data to be processed — your question really has nothing to with how the data was obtained. – martineau Sep 08 '21 at 21:54

1 Answers1

1

This question was answered well at Can you encode CR/LF in into CSV files?.

Consider also reverse engineering multiple lines in Excel. To embed a newline in an Excel cell, press Alt+Enter. Then save the file as a .csv. You'll see that the double-quotes start on one line and each new line in the file is considered an embedded newline in the cell.

to save it as .csv file you need to double quote value so if there is , it will not break your csv column and escape " with ""

for article in articles:
    ...
    # description = re.sub(r"[\r\n]+", " ", description)
    description = description.replace('"', '""')
    rows = '"%s","%s","%s","%s"\n' % (title, date, description, info)
    f.write(rows)
  • The `csv` module is capable of both handling embedded newlines and escaping quote characters if you use the right options, so doing it yourself like this it probably not necessary. – martineau Sep 08 '21 at 21:51