0

I am trying to read HTML table from outlook application using beautifulsoup. The table contains two main columns: Ticker and price. Now I am trying to add a third column named as Pkey to the existing dataframe.

I am able to add it tough and it works fine till the email has a full list of tickers (7 in total). In case sometimes we don't receive a full list of tickers, say from 7 we receive prices for only 3 tickers, then in column 3, I need Pkeys against those 3 tickers.

How is that possible?

We have the following code:

import pandas as pd
import win32com.client
from sqlalchemy.engine import create_engine
import re
from datetime import datetime, timedelta
import requests
import sys
from bs4 import BeautifulSoup
from pprint import pprint


EMAIL_ACCOUNT = 'robinhood.gmail.com'
EMAIL_SUBJ_SEARCH_STRING = 'Morgan Stanley Systematic Strategies Daily Levels'


out_app = win32com.client.gencache.EnsureDispatch("Outlook.Application")
out_namespace = out_app.GetNamespace("MAPI")


root_folder = out_namespace.GetDefaultFolder(6)

out_iter_folder = root_folder.Folders['Email_Snapper']

item_count = out_iter_folder.Items.Count

Flag = False
cnt = 1
if item_count > 0:
    for i in range(item_count, 0, -1):
        message = out_iter_folder.Items[i]
        if EMAIL_SUBJ_SEARCH_STRING in message.Subject and cnt <=1:
            cnt=cnt+1
            Body_content = message.HTMLBody
            Body_content = BeautifulSoup(Body_content,"lxml")
            html_tables = Body_content.find_all('table')[0]
            #Body_content = Body_content[:Body_content.find("Disclaimer")].strip()
            df = pd.read_html(str(html_tables),header=0)[0]
            Pkey = [71763307, 76366654, 137292386, 151971418, 151971419, 152547427, 152547246]
            df['Pkey'] = Pkey
            
            print(df) 

Output: output looks ok until we get a full list of tickers from the bank

enter image description here

But sometimes we only get prices for handful of tickers rather than a full list like below. In that case it is giving error

enter image description here

The error message I get is:

ValueError : Length of values does not match length of index*
Rahul Vaidya
  • 189
  • 2
  • 12
  • I edited your post, but I am not sure what you mean by "`print(df) #ined pkey as per tickers`" (I added the `#` to avoid syntax errors). Please correct it. – Rodalm Jun 19 '22 at 14:44
  • And please share the full traceback of the error so we can easily track where the error comes from. Also, don't post post images of the data as we can’t test them. Instead, post a sample of the DataFrame and expected output directly inside a code block. This allows us to easily reproduce your problem in order to help you. Read: [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) and [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – Rodalm Jun 19 '22 at 14:47
  • Hi Rodalm, the dataframe contains two columns (Ticker, Price) with total 7 rows, i am trying to add a third column (Pkey), the Pkey are static as per ticker like we will always use pkey 71763307 against ticker MSUSDSP5 but the dataframe doesnt contains total 7 tickers all the time, we may have dataframe with only 3 tickers so in that case we would need pkeys against those 3 tickers only – Rahul Vaidya Jun 19 '22 at 17:24

1 Answers1

1

Try using pd.series([755454,556554,2545454,54644,878798])

Atul sanwal
  • 105
  • 7