0

I am trying to collect data by web scraping from a number of pages. The problem is that I want to transpose the columns into rows to get the scraped data as a DataFrame.

I checked this question and applied it to my python code, but it didn't work properly.

Here is my code below:

browser.get('https://fortune.com/global500/2019/walmart') 

data =[]

i = 1
while True:
    table = browser.find_element_by_css_selector('tbody')
    if i > 2:
        break
    try:
        print("Scraping Page no. " + str(i))
        i = i + 1

        for row in table.find_elements_by_css_selector('tr'):
            cols =  [cell.text for cell in row.find_elements_by_css_selector('td.dataTable__value--3n5tL.dataTable__valueAlignLeft--3uvNx')]
            colsT = data.append(np.array(cols).T.tolist())

        try:
            WebDriverWait(browser, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a > span.singlePagination__icon--2KbZn"))).click()
            time.sleep(3)

        except TimeoutException:
            break

    except Exception as e:
        print(e)
        break

data1 = pd.DataFrame(data)
print(data1)

Here is the output of the code that I run:

Scraping Page no. 1
Scraping Page no. 2
                          0
0       C. Douglas McMillon
1                 Retailing
2     General Merchandisers
3         Bentonville, Ark.
4                         -
5                        25
6                 2,200,000
7              Dai Houliang
8                    Energy
9        Petroleum Refining
10                  Beijing
11                        -
12                       21
13                  619,151

And this is how I want it to be :

0    C. Douglas McMillon   Retailing   General Merchandisers    Bentonville, Ark.    -  ...
1    Dai Houliang          Energy      Petroleum Refining       Beijing              -  ...

Any suggestions or corrections will be appreciated here.

petezurich
  • 9,280
  • 9
  • 43
  • 57
Jbae.H
  • 11
  • 7

2 Answers2

1

You just can use Pandas transpose function:

df_transposed = data1.T

Output:

0    C. Douglas McMillon   Retailing   General Merchandisers    Bentonville, Ark.    -  ...
1    Dai Houliang          Energy      Petroleum Refining       Beijing              -  ...
petezurich
  • 9,280
  • 9
  • 43
  • 57
0

You can directly add a list of values as a row to data frame. I have concrete columns set and the list is added to the dataframe matching those columns.

browser.get('https://fortune.com/global500/2019/walmart') 

data =[]
df = pd.DataFrame(columns = ['c1', 'c2', 'c3', 'c4', 'c5','c6','c7'])

i = 1
while True:
    table = browser.find_element_by_css_selector('tbody')
    if i > 2:
        break
    try:
        print("Scraping Page no. " + str(i))
        i = i + 1
        values =[]

        for row in table.find_elements_by_css_selector('tr'):
            value = ([cell.text for cell in row.find_elements_by_css_selector('td.dataTable__value--3n5tL.dataTable__valueAlignLeft--3uvNx')])
            values.append(value)
        print(values)
        s = pd.Series(values,index=df.columns)
        df = df.append(s,ignore_index=True)

        try:

            WebDriverWait(browser, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a > span.singlePagination__icon--2KbZn"))).click()
            time.sleep(3)

        except TimeoutException:
            break



    except Exception as e:
        print(e)
        break


print(df)

browser.quit()

Output:

                      c1           c2  ...    c6           c7
0  [C. Douglas McMillon]  [Retailing]  ...  [25]  [2,200,000]
1         [Dai Houliang]     [Energy]  ...  [21]    [619,151]
Sureshmani Kalirajan
  • 1,938
  • 2
  • 9
  • 18