I am trying to collect data by web scraping from a number of pages. The problem is that I want to transpose the columns into rows to get the scraped data as a DataFrame.
I checked this question and applied it to my python code, but it didn't work properly.
Here is my code below:
browser.get('https://fortune.com/global500/2019/walmart')
data =[]
i = 1
while True:
table = browser.find_element_by_css_selector('tbody')
if i > 2:
break
try:
print("Scraping Page no. " + str(i))
i = i + 1
for row in table.find_elements_by_css_selector('tr'):
cols = [cell.text for cell in row.find_elements_by_css_selector('td.dataTable__value--3n5tL.dataTable__valueAlignLeft--3uvNx')]
colsT = data.append(np.array(cols).T.tolist())
try:
WebDriverWait(browser, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a > span.singlePagination__icon--2KbZn"))).click()
time.sleep(3)
except TimeoutException:
break
except Exception as e:
print(e)
break
data1 = pd.DataFrame(data)
print(data1)
Here is the output of the code that I run:
Scraping Page no. 1
Scraping Page no. 2
0
0 C. Douglas McMillon
1 Retailing
2 General Merchandisers
3 Bentonville, Ark.
4 -
5 25
6 2,200,000
7 Dai Houliang
8 Energy
9 Petroleum Refining
10 Beijing
11 -
12 21
13 619,151
And this is how I want it to be :
0 C. Douglas McMillon Retailing General Merchandisers Bentonville, Ark. - ...
1 Dai Houliang Energy Petroleum Refining Beijing - ...
Any suggestions or corrections will be appreciated here.