I'm scraping property ads with BS4, using pandas for analysis. Using multiprocessing, I have the following,
def show_ad_prices(options):
pool = Pool(options)
page_link_list=[] # list of urls of pages with ads
BS4_main(page_root_url) # BS4_main requests and parses url
last_page_number=int(container.findAll("a", href=re.compile('^('+house_href+')((?!:).)*$'))[-2].text)
for i in range(1,last_page_number):
page_nr=page_root_url+'pagina-'+str(i)+'.htm'
page_link_list.append(page_nr)
for page_link_url in page_link_list:
overall_df=pd.DataFrame()
print(page_link_url)
ad_page_urls = get_ad_page_urls(page_link_url) # returns all urls of ads on one page
try:
results = pool.map(get_ad_data, ad_page_urls) # gets data from ad
except Exception:
print('error: '+page_link_url)
continue
try:
df=pd.DataFrame.from_dict(results) # make DataFrame of data of all ads of one page
print(df)
overall_df.append(df) # append DataFrame to overall DataFrame
print(total_df)
except Exception:
print('error: '+page_link_url)
return overall_df
My code successfully creates a dataframe of all ads on one page. print(df)
prints such a "one-page" dataframe. However, when I try to append a one-page dataframe to the empty overall dataframe, nothing happens. The overall dataframe stays empty.
I've tried the answers to this question, but it doesn't seem to work. The could should be creating a one-page DataFrame and subsequently append this to the overall DataFrame.