1

I am trying to automate a search which returns a table of information. I am able to print the results in .text but my question is how can I pass the results into a Pandas dataframe. The reason why I am asking is two fold; because I would want to print the results into a CSV file and I need the results in Pandas to do data analysis later on. Appreciate if anyone could help. My code as below:

import time
from selenium import webdriver
import pandas as pd


search = ['0501020210597400','0501020210597500','0501020210597600']
df = pd.DataFrame(search)


chrome_path = [Chrome Path]
driver = webdriver.Chrome(chrome_path)

driver.get('https://enquiry.mpsj.gov.my/v2/service/cuk_search/')
x = 0

while x <(len(df.index)):
    search_box = driver.find_element_by_name('sel_value')
    new_line = (df[0][x]).format(x)
    search_box.send_keys(new_line)
    search_box.submit()
    time.sleep(5)
    table = driver.find_elements_by_class_name('tr-body')
    for data in table:
        print(data.text)
        driver.find_element_by_name('sel_value').clear()
    x +=1

driver.close()
QHarr
  • 83,427
  • 12
  • 54
  • 101
Eric Choi
  • 785
  • 2
  • 7
  • 14

2 Answers2

1

To load a CSV file to a DataFrame, you can do:

df = pd.read_csv('example.csv')

See the online doc: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html#pandas.read_csv

To write the data to CSV, consult this article: Pandas writing dataframe to CSV file on SO.

The solution is:

df.to_csv(file_name, sep='\t')
Laurent LAPORTE
  • 21,958
  • 6
  • 58
  • 103
  • I'm asking how can I load the data into a Pandas Dataframe and subsequently output to a CSV file. – Eric Choi Aug 12 '17 at 19:05
  • Do you need more explanation? If not, I suggest you to upvote and [accept](https://meta.stackexchange.com/a/5235/344471) my answer. – Laurent LAPORTE Sep 04 '17 at 21:53
  • I'm afraid it did not answer my question. What I was asking was "how can I pass the results into a Pandas dataframe" which meant the "data.text" that I printed in my script. How do I pass "data.text" into a Pandas dataframe? I know how to use df.to csv. – Eric Choi Sep 08 '17 at 18:53
  • @EricChoi: So, how your "data.text" looks like? Give an sample example, please. – Laurent LAPORTE Sep 08 '17 at 18:56
  • A sample of the "data.text" result is:"1. 0501020210597400 2A-3 JALAN PUTERI 1/2, BANDAR PUTERI, 47100 PUCHONG, Selangor RM 0.00 Pilih Untuk Bayar". I'd like to be able to pass these data to a Dataframe. The only way that i could come up with is to use pd.read_html and target the entire table. – Eric Choi Sep 08 '17 at 20:29
1

You can use requests and do a POST to get the info rather than use selenium

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

search = ['0501020210597400','0501020210597500','0501020210597600']
headers = {'Referer' : 'https://enquiry.mpsj.gov.my/v2/service/cuk_search/1',
          'User-Agent' : 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'
          }
output = []
dfHeaders = ['No.', 'No. Akaun', 'Nama Di Bil', 'Jumlah Perlu Dibayar', '']

with requests.Session() as s:      
    for item in search:
        r = s.get('https://enquiry.mpsj.gov.my/v2/service/cuk_search/1', headers = headers)
        soup = bs(r.content, 'lxml')
        key = soup.select_one('[name=ACCESS_KEY]')['value']
        body = {'sel_input': 'no_akaun', 'sel_value': item, 'ACCESS_KEY': key}
        res = s.post('https://enquiry.mpsj.gov.my/v2/service/cuk_search_submit/', data = body)
        soup = bs(res.content, 'lxml')
        table = soup.select_one('.tbl-list')
        rows = table.select('.tr-body')

        for row in rows:
            cols = row.find_all('td')
            cols = [item.text.strip() for item in cols]
            output.append([item for item in cols if item])

df = pd.DataFrame(output, columns = dfHeaders)
print(df)
df.to_csv(r'C:\Users\User\Desktop\Data.csv', sep=',', encoding='utf-8-sig',index = False )
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • let me know if this is what you meant – QHarr Feb 26 '19 at 19:12
  • i think this is a good solution. Can you recommend any good reading material or reference for requests and POST? i'd like to learn more on how to do that. Thank you. – Eric Choi Feb 28 '19 at 04:00
  • Did it answer the question ok? And sure I will dig out some info once at work. – QHarr Feb 28 '19 at 06:21
  • Run through [this](http://docs.python-requests.org/en/master/user/quickstart/) .Note there more complicated POST examples in left nav bar. Also, read [this](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/POST) – QHarr Feb 28 '19 at 08:29