Create a data frame from dictionaries, which are iteratively generated

Question

I am trying to do web scraping to automate information collection instead of doing it manually.

For a given stock, a function (get_info) will return in a dictionary some information.

Example of output dictionnary

For company A

dict_A = {'enterpriseRevenue': 1.264,
          'profitMargins': -0.00124,
          'enterpriseToEbitda': 28.328,
          'sharesOutstanding': 3907579904,
          'bookValue': 8.326}

For company B

dict_B = {'enterpriseRevenue': 2.789,
          'profitMargins': 2.34,
          'enterpriseToEbitda': 28.328,
          'sharesOutstanding': 2874818942,
          'bookValue': 4.189}

From a list of stocks, I would like to create a data frame with all items of dictionary return by the get_info function. Desired algorithm in "natural language"

Create an empty data frame with 6 columns (first column for stock name, rest for dictionary items), called df

for s in list_of_stocks:
    toto = get_info(s) # get the information for the stock, type(toto)=dict
    add new line to df, which values correspond to toto

Example of desired output

Stock, enterpriseRevenue, profitMargins, enterpriseToEbitda, sharesOutstanding, bookValue
A, 1.264, -0.00124, 28.328, 3907579904, 8.326
B, 2.789, 2.34, 28.328, 2874818942, 4.189

Does anyone have any idea how to build this data frame?

Yes, i can store all dictionaries in a list. I though about using pd.DataFrame.from_dict(data, orient='index'), but then I don't know how to keep only values and not dictionaries index .... — CharlesAntoine, Dec 16 '20 at 21:45
Using a list of dictionaries will answer the problem, directly with the above mentioned function. Examples on use list of dictionaries in this post: https://stackoverflow.com/questions/20638006/convert-list-of-dictionaries-to-a-pandas-dataframe — CharlesAntoine, Dec 16 '20 at 22:23

score 2 · Accepted Answer · answered Dec 16 '20 at 22:40

2

Did you try :

pd.DataFrame([get_info(d) for d in list_of_stocks])

answered Dec 16 '20 at 22:40

adir abargil

5,495
3
19
29

1

Thanks! I did not know it was possible to have a for loop insode the dataframe creation function. I updated the get_info function to add the stock name (name = {'stockname': stock}) to the already existing dictionary – CharlesAntoine Dec 17 '20 at 21:41

rada-dev · Answer 2 · 2020-12-17T00:03:49.710

Try to use this.

# with the data structure like this, it might be easier to handle
data = {
    "stockA": {
        'enterpriseRevenue': 1.264,
        'profitMargins': -0.00124,
        'enterpriseToEbitda': 28.328,
        'sharesOutstanding': 3907579904,
        'bookValue': 8.326
    },
    "stockB": {
        'enterpriseRevenue': 2.789,
        'profitMargins': 2.34,
        'enterpriseToEbitda': 28.328,
        'sharesOutstanding': 2874818942,
        'bookValue': 4.189
    }
}

# getting the keys within the stockXY dict, which will be the column names
data_keys = data[list(data.keys())[0]].keys()   # raises IndexError when data dictionary is empty
column_captions = ["stock"]+list(data_keys)
print(", ".join(map(str, column_captions)))

for stock, stock_data in data.items():
    message = stock+", "+", ".join(map(str, stock_data.values()))
    print(message)

It seems like you want to save the data to textfile... if so, you might take a look a json. https://docs.python.org/3/library/json.html

Thanks ! You guessed right for the output .txt output format. I'm not familiar at all with json so will go through the doc to learn about it — CharlesAntoine, Dec 17 '20 at 21:44

Create a data frame from dictionaries, which are iteratively generated

2 Answers2