Parse data from website, compare data from csv and write data to csv file

Question

I stuck with this problem:

I have a sec_sic.csv file with a data:

ticker  SIC
A   3826
AA  3334
AAL 4512
AAN 7359
AAP 5531

I need to read data from sec_sic to compare SIC with SIC Code at website (add new columns Office and Industry) and create new SIC file with all new data

I tried with this code:

import pandas as pd
import requests
import csv

url = "https://www.sec.gov/info/edgar/siccodes.htm"

r = requests.get(url)
df_list = pd.read_html(r.text) # this parses all the tables in webpages to a list
df = df_list[0]
#df.set_index('SIC Code', inplace=True)
#print(df.head())
#print(df['Office'])
sic_num =0

base_df = pd.read_csv('sec_sic.csv')

with open("sec_sic_to_industry.csv", "w+", newline='',encoding='utf-8') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(["ticker", "SIC","office", "industry"])
print("Num of SIC "+str(len(base_df['SIC'])))

while sic_num <= len(base_df['SIC']): #len(base_df['SIC'])




    #print(base_df['SIC'][0])

    filt = (base_df['SIC'][sic_num] == df['SIC Code'])#df['SIC Code']

    #print(df.loc[filt, ["Office", "Industry Title"]])

    one = df.loc[filt, "Office"]
    two = df.loc[filt, "Industry Title"]
    one_1 = one.to_string()
    two_1 = two.to_string()


    #print(base_df['SIC'][sic_num])
    one_2 = one_1.split(" ",1)[1]
    two_2 = two_1.split(" ",1)[1]
    SIC = base_df['SIC'][sic_num]
    ticker = base_df['ticker'][sic_num]


    with open("sec_sic_to_industry.csv", "a+",newline='', encoding='utf-8') as csvfile:

        writer = csv.writer(csvfile)
        writer.writerow([ticker,SIC,one_2, two_2])
    sic_num +=1

But I have problem at last column industry with text, It is not completed sometimes.

ticker  SIC office  industry
ALB 2821    Office of Life Sciences    PLASTIC MATERIALS, SYNTH RESINS & NONVULCAN EL...
ALGN    3842    Office of Life Sciences    ORTHOPEDIC, PROSTHETIC & SURGICAL APPLIANCES &...

It is completed. It's just too long to fit on your screen. Check its value for a particular row like this: `df['industry'].iloc[0]` — Mayank Porwal, May 08 '20 at 18:39
But it is happened after I split at this part: two_2 = two_1.split(" ",1)[1] — Mike, May 08 '20 at 18:56
Find a solution here https://stackoverflow.com/questions/29902714/print-very-long-string-completely-in-pandas-dataframe — Mike, May 08 '20 at 19:04

Parse data from website, compare data from csv and write data to csv file

0 Answers0