2
from bs4 import BeautifulSoup
soup = BeautifulSoup(r.text)
print(soup.title)
print(soup.title.string)

r = requests.get 
('https://www.
street/
print(len(r.text))

now i need to extract data.

I've tries something like this

results = soup.find_all('tr')
r = []
for count in range(0, 6): 
    k = k.next_sibling
    r.append(k.string)
    results.append(r)
print('Number of results', len(results))
    for row in range(0, len(results)):
print(results[row])

but this doesnt return me anything. how can I extract the data from the web ? Thank you !

Anna
  • 57
  • 5

1 Answers1

1

You could get all div with class tr with soup.findAll("div", {"class":"tr"}). This would return all div container with that class.

Note that those div have also the data in html attributes such as data-unit, data-size, data-price... so it makes things easier for scraping those values

Code :

import requests
import pandas as pd
from bs4 import BeautifulSoup

r = requests.get('https://www.cityrealty.com/nyc/roosevelt-island/rivercross-531-main-street/closing-history/57182')
soup = BeautifulSoup(r.text, "html.parser")
data = [
    t.attrs
    for t in soup.findAll("div", {"class":"tr"})
    if t.has_attr("data-unit")
]
df = pd.DataFrame(data)
del df['class']
print(df)

Output :

   data-unit data-size data-sizeft data-price data-priceft data-priceask   data-date data-total
0       1916         3        1777    1175000          661       1250000  1587700800         84
1       1612         2        1364    1150000          843       1250000  1580274000         84
2        411         1         972     620000          638        640000  1580101200         84
3       1003         3        1777    1131000          636       1245000  1577077200         84
4       1411         1           -     682000            -             -  1576731600         84
..       ...       ...         ...        ...          ...           ...         ...        ...
79      1403                     -      52877            -             -  1138683600         84
80      1315                     -      54921            -             -  1135141200         84
81       123                     -      52241            -             -  1093406400         84
82      1915                     -      51037            -             -  1058932800         84
83      1819                     -      53642            -             -  1049688000         84

[84 rows x 8 columns]
Bertrand Martel
  • 42,756
  • 16
  • 135
  • 159
  • thank you so much ! also how can I save it as csv file ? I've tried this following : import csv with open('cityrealty.csv','w') as f: w=csv.writer(f) w.writerows(data.items()) //but this gives me an attribute error – Anna Jun 09 '20 at 17:29
  • AttributeError Traceback (most recent call last) in 3 with open('cityrealty.csv','w') as f: 4 w=csv.writer(f) ----> 5 w.writerows(data.items()) AttributeError: 'list' object has no attribute 'items' – Anna Jun 09 '20 at 17:33
  • 1
    you can save the pandas dataframe to csv https://stackoverflow.com/questions/16923281/writing-a-pandas-dataframe-to-csv-file – Bertrand Martel Jun 09 '20 at 17:34