-1

I am web-scraping 2 table from 2 different sites. I want to append a new column (called WHEREFROM in the header) with a web-scraping text, in my code i called it "name".

My code is here:

from bs4 import BeautifulSoup
from selenium import webdriver
import time
import urllib2
import unicodecsv as csv
import os
import sys
import io
import time
import datetime
import pandas as pd
from bs4 import BeautifulSoup
import re
import contextlib
import selenium.webdriver.support.ui as ui

filename=r'output.csv'

resultcsv=open(filename,"wb")
output=csv.writer(resultcsv, delimiter=';',quotechar = '"', quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1')
output.writerow(['TIME','FLIGHT','FROM','AIRLANE','AIRCRAFT','STATUS','WHEREFROM', 'ACTUALDATE']) 

def scrape(urls):
    browser = webdriver.Firefox()
    for url in urls:
        browser.get(url)
        html = browser.page_source
        soup=BeautifulSoup(html,"html.parser")
        table = soup.find('table', { "class" : "table table-condensed table-hover data-table m-n-t-15" })
        soup2=BeautifulSoup(html,"html.parser")
        name = soup2.find('div' , attrs={'class' : 'row m-t-l m-l-l'})
        datatable=[]
        for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"):
            temp_data = []
            for data in record.find_all("td"):
                temp_data.append(data.text.encode('latin-1'))
            newlist = filter(None, temp_data)
            datatable.append(newlist)
        print name
        output.writerows(datatable)

    resultcsv.close()
    time.sleep(10) 
    browser.close()

urls = ["https://www.flightradar24.com/data/airports/bud/arrivals", "https://www.flightradar24.com/data/airports/fco/arrivals"]
scrape(urls)
resultcsv.close()

How can I do this in a loop, and how can I do this correctly? Because after that I am writing these data to csv, where the delimiter is ; .

But after web-scraping tables there isn't any ; in the last text, so I think I have to insert a ; in this last text too?!

I am talking about this:

"1:15 PM";" KL1975";"Amsterdam (AMS)-";"KLM";"B737 (PH-BGT) ";"Landed 1:01 PM"

EDITED with the actual date (not working, format issue):

df = pd.DataFrame(newlist)
now = time.strftime('%d-%m-%Y')
df['ACTUALDATE'] = now
#df.rows = header
df.to_csv('output.csv', sep=';', encoding='latin-1', index=False)

I wrote it in the loop, to see the actual date (hours-minutes too, but this is only the day)

tardos93
  • 235
  • 2
  • 17
  • Don't use the code option. It is meant for javascript ONLY. Just paste your code, select it and hit ctrl+K. – cs95 Jul 28 '17 at 11:42
  • Do I understand you correctly that you need _semicolon_ (";") _ **after** last csv column? – Dmitriy Fialkovskiy Jul 28 '17 at 11:43
  • If you want to write all elements from list of strings as single string with `";"` divider, try `datatable = ";".join(datatable)`. Also note that applying `filret()` to existed list seem to be redundant. You'd better to implement [this answer](https://stackoverflow.com/questions/45367440/how-to-delete-a-string-in-a-loop-with-python/45372006#45372006) to create valid list – Andersson Jul 28 '17 at 11:43
  • I need a semicolon only after the last csv columns, because there isn't any semicolon there, and after that i want to append a new colomns whit the value of "name". And when I am writing it to csv it is going to be in the correct format because of this semicolon?! – tardos93 Jul 28 '17 at 11:47
  • @Andersson the OP is using the `csv` module so this has nothing to do with manually adding the field separator... – bruno desthuilliers Jul 28 '17 at 11:55
  • I am not sure in this question, because i already tried to add a "actual date" to my csv file, and the format was broken because of it.. (i update my code with the actual date too, but i don't know yet what i am wrong.) – tardos93 Jul 28 '17 at 11:58

1 Answers1

1

It seems so trivial that I'm not even sure I really understood the question... If what you want is to add name as the last element of each row in your csv, all you have to do is, well, to add it as the last element of the rows you're passing to your csv writer:

for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"):
    temp_data = []
    for data in record.find_all("td"):
        temp_data.append(data.text.encode('latin-1'))
    # here
    temp_data.append(name)
bruno desthuilliers
  • 75,974
  • 6
  • 88
  • 118
  • Ahh, it is really so trivial, thanks! But my "name" text why looks like this?

    BUD/LHBP

    – tardos93 Jul 28 '17 at 12:08
  • Because that's what you asked for: `name = soup2.find('div' , attrs={'class' : 'row m-t-l m-l-l'})`. If you only want the text content you have to ask for it: `name = soup2.find('div' , attrs={'class' : 'row m-t-l m-l-l'}).text.strip()` (beware this could raise an `AttributeError` if there's no matching tag in the html source). – bruno desthuilliers Jul 28 '17 at 12:54
  • yeah, i corrected it with this "name = soup2.h2.string" – tardos93 Jul 28 '17 at 12:56