Report data can be retrieved by Salesforce Reports and Dashboards REST API.
This works in Salesforce since Summer'15 (ver 34.0).
I wrote an example with package Simple-salesforce, due to REST API. (It is however possible to rewrite it without simple-salesforce and use an api session from Beatbox, write at least about 10 additional lines of code and install only the requests package.)
Universal code
from collections import OrderedDict
from simple_salesforce import Salesforce
import pandas as pd
import json
class SfReportsApi(Salesforce):
def __init__(self, *args, **kwargs):
super(SfReportsApi, self).__init__(*args, **kwargs)
def describe_report(self, report_id):
return self._call_report(report_id, command='/describe')
def to_pandas_dataframe(self, report_id, metadata=None):
"""SF report details exported to DataFrame, can be modified by metadata"""
resp = self._call_report(report_id, metadata=metadata)
if not resp['allData']:
print("Detailed data have been truncated to the usual report limit (2000).")
columns = []
converters = []
get_label = lambda x: x['label']
sf_pandas_map = {
'boolean': lambda x: x['value'],
'currency': lambda x: x['value'].get('amount'),
'date': lambda x: pd.Timestamp(x['value']),
'datetime': lambda x: pd.Timestamp(x['value']),
'double': lambda x: x['value'],
'picklist': get_label,
'string': get_label,
'textarea': get_label,
}
for col in resp['reportExtendedMetadata']['detailColumnInfo'].values():
columns.append(col['label'])
converters.append(sf_pandas_map.get(col['dataType'], get_label))
data = [[conv(cell) for conv, cell in zip(converters, row['dataCells'])]
for sect_key, section in resp['factMap'].items()
if 'rows' in section
for row in section['rows']
]
df = pd.DataFrame(data, columns=columns)
return df
def _call_report(self, report_id, metadata=None, command=None):
url = '{}analytics/reports/{}{}'.format(self.base_url, report_id, command or '')
data = json.dumps({'reportMetadata': metadata}) if metadata else None
resp = self._call_salesforce('POST' if metadata else 'GET', url, data=data)
return resp.json(object_pairs_hook=OrderedDict)
Usage example
report_id = '00O24000004qtI4EAI'
# set Salesforce session_id some way (by login or reused from other app)
sf = SfReportsApi(username='me@example.com', password='password', security_token='token')
# sf = SfReportsApi(instance_url='https://na1.salesforce.com', session_id='')
# get report metadata if useful
metadata = sf.describe_report(report_id)['reportMetadata']
# modify them or write only the modified keys, e.g. change filters or remove subtotals etc.
metadata = {
'orderBy': ['ACCOUNT.NAME'],
'reportFilters': [{'value': 'W', 'column': 'ACCOUNT.NAME', 'operator': greaterOrEqual'}]
}
# or you can omit `metadata` parameter and use the report as is without changing anything
df = sf.to_pandas_dataframe(report_id, metadata)
It is possible to dynamically add columns, filters, sorting etc. (docs about report Execute synchronous). The method to_pandas_dataframe
is for a normal tabular report with details and optionally can be with one grand-total and not more than one level of subtotals. It could be possible to retrieve data from more complicated reports (see docs about Decode the Fact Map or a cheatsheet), but it is not implemented because it is easier to remove them on the fly by metadata parameter before running.
Only 2000 detailed data rows can be reported. Several requests with filters can be used to see all data.
Fixed code The new code works for any report that contain original rows. It it is not for summary reports without original rows. (The old code worked only for reports with rows and subtotals. Excuse)