3

Has anybody tried parsing SalesForce report into Pandas DataFrame using Beatbox? There are couple of examples on SO but none of them have provided comprehensive solution or at least what I have perceived it hasn't.

#!/usr/bin/env python3

import beatbox
import pandas as pd


sf = beatbox._tPartnerNS
service = beatbox.Client()
service.serverUrl = 'https://login.salesforce.com/services/Soap/u/38.0'
service.login('my-username', 'my-password')

report_id = '00myreport4G3V'

query = "SELECT Name FROM Report where id = '{}'".format(report_id)
query_result = service.query(query)

This is just selecting the name but ideally I would like to load the content of report into a DataFrame. Any help please?

hynekcer
  • 14,942
  • 6
  • 61
  • 99
r0xette
  • 898
  • 3
  • 11
  • 24

2 Answers2

3

Report data can be retrieved by Salesforce Reports and Dashboards REST API. This works in Salesforce since Summer'15 (ver 34.0).

I wrote an example with package Simple-salesforce, due to REST API. (It is however possible to rewrite it without simple-salesforce and use an api session from Beatbox, write at least about 10 additional lines of code and install only the requests package.)

Universal code

from collections import OrderedDict
from simple_salesforce import Salesforce
import pandas as pd
import json

class SfReportsApi(Salesforce):
    def __init__(self, *args, **kwargs):
        super(SfReportsApi, self).__init__(*args, **kwargs)

    def describe_report(self, report_id):
        return self._call_report(report_id, command='/describe')

    def to_pandas_dataframe(self, report_id, metadata=None):
        """SF report details exported to DataFrame, can be modified by metadata"""
        resp = self._call_report(report_id, metadata=metadata)
        if not resp['allData']:
            print("Detailed data have been truncated to the usual report limit (2000).")
        columns = []
        converters = []
        get_label = lambda x: x['label']
        sf_pandas_map = {
            'boolean': lambda x: x['value'],
            'currency': lambda x: x['value'].get('amount'),
            'date': lambda x: pd.Timestamp(x['value']),
            'datetime': lambda x: pd.Timestamp(x['value']),
            'double': lambda x: x['value'],
            'picklist': get_label,
            'string': get_label,
            'textarea': get_label,
        }
        for col in resp['reportExtendedMetadata']['detailColumnInfo'].values():
            columns.append(col['label'])
            converters.append(sf_pandas_map.get(col['dataType'], get_label))
        data = [[conv(cell) for conv, cell in zip(converters, row['dataCells'])]
                for sect_key, section in resp['factMap'].items()
                    if 'rows' in section
                        for row in section['rows']
                ]
        df = pd.DataFrame(data, columns=columns)
        return df

    def _call_report(self, report_id, metadata=None, command=None):
        url = '{}analytics/reports/{}{}'.format(self.base_url, report_id, command or '')
        data = json.dumps({'reportMetadata': metadata}) if metadata else None
        resp = self._call_salesforce('POST' if metadata else 'GET', url, data=data)
        return resp.json(object_pairs_hook=OrderedDict)

Usage example

report_id = '00O24000004qtI4EAI'
# set Salesforce session_id some way (by login or reused from other app)
sf = SfReportsApi(username='me@example.com', password='password', security_token='token')
# sf = SfReportsApi(instance_url='https://na1.salesforce.com', session_id='')
# get report metadata if useful
metadata = sf.describe_report(report_id)['reportMetadata']
# modify them or write only the modified keys, e.g. change filters or remove subtotals etc.
metadata = {
    'orderBy': ['ACCOUNT.NAME'],
    'reportFilters': [{'value': 'W', 'column': 'ACCOUNT.NAME', 'operator': greaterOrEqual'}]
}
# or you can omit `metadata` parameter and use the report as is without changing anything
df = sf.to_pandas_dataframe(report_id, metadata)

It is possible to dynamically add columns, filters, sorting etc. (docs about report Execute synchronous). The method to_pandas_dataframe is for a normal tabular report with details and optionally can be with one grand-total and not more than one level of subtotals. It could be possible to retrieve data from more complicated reports (see docs about Decode the Fact Map or a cheatsheet), but it is not implemented because it is easier to remove them on the fly by metadata parameter before running.

Only 2000 detailed data rows can be reported. Several requests with filters can be used to see all data.

Fixed code The new code works for any report that contain original rows. It it is not for summary reports without original rows. (The old code worked only for reports with rows and subtotals. Excuse)

hynekcer
  • 14,942
  • 6
  • 61
  • 99
  • Added this code to gist [Parse Salesforce report data in Python](https://gist.github.com/hynekcer/a9d72c959c4569af32141e2586c7b86e). – hynekcer Aug 22 '17 at 09:19
  • I dont believe that the call_report method is working in simple_salesforce? can someone please verify – aero8991 Apr 09 '22 at 00:33
  • 1
    @aero8991 I fixed the code to work with many types of reports now. See the comment at the end of answer – hynekcer Apr 09 '22 at 21:07
2

I am not familiar with BeatBox, but it is very easy to use simple-salesforce to pull csv and then convert it to DataFrame.

#-*-coding:utf-8-*-
import pandas as pd
import numpy as np
from simple_salesforce import Salesforce 
import requests
###login to SF
sf = Salesforce(username='xxxxx', 
                password='xxxx', 
                security_token='', 
                organizationId='xxxxxx')
def readReport(reportid):
    with requests.session() as s:
        d = s.get("https://ap1.salesforce.com/{}?export=1&enc=UTF-8&xf=csv".format(reportid), 
                  headers=sf.headers, 
                  cookies={'sid': sf.session_id})
    import sys
    if sys.version_info[0] < 3:
        from StringIO import StringIO
    else:
        from io import StringIO
    return pd.read_csv(StringIO(d.text), sep=",")

df = readReport('your report id')
Max Wang
  • 21
  • 4