0

I am trying to do web scraping to automate information collection instead of doing it manually.

For a given stock, a function (get_info) will return in a dictionary some information.

Example of output dictionnary

For company A

dict_A = {'enterpriseRevenue': 1.264,
          'profitMargins': -0.00124,
          'enterpriseToEbitda': 28.328,
          'sharesOutstanding': 3907579904,
          'bookValue': 8.326}

For company B

dict_B = {'enterpriseRevenue': 2.789,
          'profitMargins': 2.34,
          'enterpriseToEbitda': 28.328,
          'sharesOutstanding': 2874818942,
          'bookValue': 4.189}

From a list of stocks, I would like to create a data frame with all items of dictionary return by the get_info function. Desired algorithm in "natural language"

Create an empty data frame with 6 columns (first column for stock name, rest for dictionary items), called df

for s in list_of_stocks:
    toto = get_info(s) # get the information for the stock, type(toto)=dict
    add new line to df, which values correspond to toto

Example of desired output

Stock, enterpriseRevenue, profitMargins, enterpriseToEbitda, sharesOutstanding, bookValue
A, 1.264, -0.00124, 28.328, 3907579904, 8.326
B, 2.789, 2.34, 28.328, 2874818942, 4.189

Does anyone have any idea how to build this data frame?

  • 2
    You can collect all the info and then build the df? – Dani Mesejo Dec 16 '20 at 21:11
  • Yes, i can store all dictionaries in a list. I though about using pd.DataFrame.from_dict(data, orient='index'), but then I don't know how to keep only values and not dictionaries index .... – CharlesAntoine Dec 16 '20 at 21:45
  • Using a list of dictionaries will answer the problem, directly with the above mentioned function. Examples on use list of dictionaries in this post: https://stackoverflow.com/questions/20638006/convert-list-of-dictionaries-to-a-pandas-dataframe – CharlesAntoine Dec 16 '20 at 22:23
  • Did you find a solution? – adir abargil Dec 17 '20 at 18:07

2 Answers2

2

Did you try :

pd.DataFrame([get_info(d) for d in list_of_stocks])
adir abargil
  • 5,495
  • 3
  • 19
  • 29
  • 1
    Thanks! I did not know it was possible to have a for loop insode the dataframe creation function. I updated the get_info function to add the stock name (name = {'stockname': stock}) to the already existing dictionary – CharlesAntoine Dec 17 '20 at 21:41
1

Try to use this.

# with the data structure like this, it might be easier to handle
data = {
    "stockA": {
        'enterpriseRevenue': 1.264,
        'profitMargins': -0.00124,
        'enterpriseToEbitda': 28.328,
        'sharesOutstanding': 3907579904,
        'bookValue': 8.326
    },
    "stockB": {
        'enterpriseRevenue': 2.789,
        'profitMargins': 2.34,
        'enterpriseToEbitda': 28.328,
        'sharesOutstanding': 2874818942,
        'bookValue': 4.189
    }
}

# getting the keys within the stockXY dict, which will be the column names
data_keys = data[list(data.keys())[0]].keys()   # raises IndexError when data dictionary is empty
column_captions = ["stock"]+list(data_keys)
print(", ".join(map(str, column_captions)))

for stock, stock_data in data.items():
    message = stock+", "+", ".join(map(str, stock_data.values()))
    print(message)

It seems like you want to save the data to textfile... if so, you might take a look a json. https://docs.python.org/3/library/json.html

rada-dev
  • 122
  • 5
  • Thanks ! You guessed right for the output .txt output format. I'm not familiar at all with json so will go through the doc to learn about it – CharlesAntoine Dec 17 '20 at 21:44