This other answer uses computeStats and getDatasetStats Foundry APIs. There's another API - getComputedDatasetStats - which gets your required stats and may even perform better.
According to my tests:
- getDatasetStats is not available unless computeStats is run. The latter takes time. On the other hand, getComputedDatasetStats is available right away.
- getComputedDatasetStats will return
sizeInBytes
, but only if computeStats is not run. When I called the computeStats API, and it finished the job, sizeInBytes
became null. getDatasetStats showed null too.
To get the row count, column count and dataset size you may try using something similar to this:
import requests
import json
def getComputedDatasetStats(token, dataset_rid, api_base='https://.....'):
response = requests.post(
url=f'{api_base}/foundry-stats/api/computed-stats-v2/get',
headers={
'content-type': 'application/json',
'Authorization': 'Bearer ' + token
},
data=json.dumps({
"datasetRid": dataset_rid,
"branch": "master"
})
)
return response.json()
token = 'eyJwb.....'
dataset_rid = 'ri.foundry.main.dataset.1d9ef04e-7ec6-456e-8326-1c64b1105431'
result = getComputedDatasetStats(token, dataset_rid)
# full resulting json:
# print(json.dumps(result, indent=4))
# required statistics:
print('size:', result['computedDatasetStats']['sizeInBytes'])
print('rows:', result['computedDatasetStats']['rowCount'])
print('cols:', len(result['computedDatasetStats']['columnStats']))
Example output:
size: 24
rows: 2
cols: 2