1

In Pandas, I am creating a dataframe that merges data from two different Beatbox queries. First, I pull all my Opportunity data, then I pull all my Account data, and then I merge.

However I would like to optimize this process by only pulling data for account['ID'] that exists in the oppty['AccountID'] column, as opposed to pulling the entirety of the Account data before a merge. How do I do this?

Current State:

query_result = svc.query("SELECT ID, AccountID FROM Opportunity")
records = query_result['records']
oppty = pd.DataFrame(records)

query_result = svc.query("SELECT ID, Website FROM Account")
records = query_result['records']
account = pd.DataFrame(records)

mylist = pd.merge(oppty, account, left_on='AccountID', right_on='ID', how='left')
Eric
  • 673
  • 1
  • 6
  • 12

2 Answers2

2

You can use a SOQL semi-join to restrict the Account query to only those accounts with opportunities, e.g.

svc.query("SELECT ID,Website FROM Account where ID in (SELECT accountId FROM Opportunity)")
superfell
  • 18,780
  • 4
  • 59
  • 81
  • Thank you! This works wonderfully. What is the syntax for adding an OR to that? As in (SELECT AccountID FROM Opportunity) OR (SELECT AccountID FROM User). So if the Account ID is in either set of data, it will pull the website for me. – Eric Jun 17 '15 at 19:40
  • http://stackoverflow.com/questions/30902100/beatbox-how-do-i-add-or-function-to-where-clause-when-pulling-data-from-sfdc – Eric Jun 17 '15 at 20:48
-1

Since I don't have access to your svc object this is just a suggestion. But try

query_result = svc.query("""SELECT ID, AccountID FROM Opportunity
                            JOIN Account on Account.ID = Opportunity.AccountID""")

as a single query.

This should extract the data using a left inner join, which omits unmatched rows in both tables. It also does the join in SQL, reducing the amount of database traffic (and therefore also network bandwidth) by having the database server do the work, thereby reducing the computational load on your desktop client system.

holdenweb
  • 33,305
  • 7
  • 57
  • 77
  • the query language for salesforce is SOQL, not SQL, and it doesn't support arbitrary joins like this. – superfell Jun 17 '15 at 01:03
  • OK, that was the missing clue, thanks. I'll delete the answer if you respond to this comment. Otherwise my ignorance of the context (and my willfully ignoring the tags) can stand as a lesson for all. – holdenweb Jun 17 '15 at 07:48