0

I am getting the following error:

attributeerror: 'dataframe' object has no attribute 'append'

when using concat and _append function in the code below. The output produces a blank dataset (see image below).

How should I resolve this?

import pandas as pd
import requests
from bs4 import BeautifulSoup

final=pd.DataFrame()
for j in range(1,800):

headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.3; Win 64 ; x64) Apple WeKit /537.36(KHTML , like Gecko) Chrome/80.0.3987.162 Safari/537.36'}
response=requests.get('https://www.ambitionbox.com/list-of-companies?campaign=desktop_nav&page={}',headers=headers).text
soup=BeautifulSoup(response,'html.parser')
company=soup.find_all('div',class_='company-content-wrapper')
name=[]
rating=[]
reviews=[]
ctype=[]
hq=[]
how_old=[]
no_of_employee=[]

for i in company:
    name.append(i.find('h2').text.strip())
    rating.append(i.find('p',class_='rating').text.strip())
    reviews.append(i.find('a',class_='review-count').text.strip())
    info_entities=i.find_all('p',class_='infoEntity')
    if len(info_entities)>0:
        ctype.append(info_entities[0].text.strip())
    else:
        ctype.append("N/A")

    if len(info_entities)>1:
        hq.append(info_entities[1].text.strip())
    else:
        hq.append("N/A")

    if len(info_entities)>2:
        how_old.append(info_entities[2].text.strip())
    else:
        how_old.append("N/A")

    if len(info_entities)>3:
        no_of_employee.append(info_entities[3].text.strip())
    else:
        no_of_employee.append("N/A")

dataset=pd.DataFrame({
    'Name':name,
    'Rating':rating,
    'reviews':reviews,
    'Company Type':ctype,
    'Headquaters':hq,
    'Company Age':how_old,
    'No. of Employee':no_of_employee
})

final=pd.concat([final,dataset],ignore_index=True)
print(final)
final.to_csv('Company_Dataset.csv')

Blank dataset output

Kyle F Hartzenberg
  • 2,567
  • 3
  • 6
  • 24

2 Answers2

0

This Python script works just fine from my laptop (with light changes in indentation with the outer for loop to make it works), output .csv doesn't blank. As you've mentioned, append is removed previously and concat is used instead from April 2023. On Stackoverflow we can find this here . So I don't think that's the problem.

Can you try rerun with smaller list of pages, modify your question with a proper format of your code, or give any other information? Here is your code I've used to run (I just change use a list of pages number to see the output quickly)

import pandas as pd
import requests
from bs4 import BeautifulSoup

final = pd.DataFrame()

for j in [1, 2, 600, 601]:
    
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win 64 ; x64) AppleWeKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.162 Safari/537.36'
    }
    response = requests.get(f'https://www.ambitionbox.com/list-of-companies?campaign=desktop_nav&page={j}', headers=headers).text
    soup = BeautifulSoup(response, 'html.parser')
    company = soup.find_all('div', class_='company-content-wrapper')
    name = []
    rating = []
    reviews = []
    ctype = []
    hq = []
    how_old = []
    no_of_employee = []

    for i in company:
        name.append(i.find('h2').text.strip())
        rating.append(i.find('p', class_='rating').text.strip())
        reviews.append(i.find('a', class_='review-count').text.strip())
        info_entities = i.find_all('p', class_='infoEntity')
        if len(info_entities) > 0:
            ctype.append(info_entities[0].text.strip())
        else:
            ctype.append("N/A")

        if len(info_entities) > 1:
            hq.append(info_entities[1].text.strip())
        else:
            hq.append("N/A")

        if len(info_entities) > 2:
            how_old.append(info_entities[2].text.strip())
        else:
            how_old.append("N/A")

        if len(info_entities) > 3:
            no_of_employee.append(info_entities[3].text.strip())
        else:
            no_of_employee.append("N/A")

    dataset = pd.DataFrame({
        'Name': name,
        'Rating': rating,
        'Reviews': reviews,
        'Company Type': ctype,
        'Headquarters': hq,
        'Company Age': how_old,
        'No. of Employees': no_of_employee
    })

    final = pd.concat([final, dataset], ignore_index=True)

final.to_csv('Company_Dataset_2.csv', index=True)
SophieTP
  • 44
  • 2
0

Milan - You mis-typed the proper formatting of your response. You wrote:

response=requests.get('https://www.ambitionbox.com/list-of-companies?campaign=desktop_nav&page={}',headers=headers).text

When you should have this: a f-string with j into curly brackets.

response=requests.get(f'https://www.ambitionbox.com/list-of-companies?campaign=desktop_nav&page={j}',headers=headers).text

SophieTP corrected the error in their answer, hence the lack of error for them. When correcting for that, it works perfectly well.

Syla
  • 131
  • 3