2

I don't understand how to import a Smartsheet and convert it to a pandas dataframe. I want to manipulate the data from smartsheets, currently I go to smartsheets export to csv and import csv in python but want to eliminate this step so that it can run on a schedule.

import smartsheet
import pandas as pd

access_token ='#################'

smartsheet = Smartsheet(access_token)
sheet = smartsheet.sheets.get('Sheet 1')
pd.DataFrame(sheet)

4 Answers4

9

Here is a simple method to convert a sheet to a dataframe:

def simple_sheet_to_dataframe(sheet):
    col_names = [col.title for col in sheet.columns]
    rows = []
    for row in sheet.rows:
        cells = []
        for cell in row.cells:
            cells.append(cell.value)
        rows.append(cells)
    data_frame = pd.DataFrame(rows, columns=col_names)
    return data_frame

The only issue with creating a dataframe from smartsheets is that for certain column types cell.value and cell.display_value are different. For example, contact columns will either display the name or the email address depending on which is used.

Here is a snippet of what I use when needing to pull in data from Smartsheet into Pandas. Note, I've included garbage collection as I regularly work with dozens of sheets at or near the 200,000 cell limit.

import smartsheet
import pandas as pd
import gc

configs = {'api_key': 0000000,
           'value_cols': ['Assigned User']}

class SmartsheetConnector:
    def __init__(self, configs):
        self._cfg = configs
        self.ss = smartsheet.Smartsheet(self._cfg['api_key'])
        self.ss.errors_as_exceptions(True)

    def get_sheet_as_dataframe(self, sheet_id):
        sheet = self.ss.Sheets.get_sheet(sheet_id)
        col_map = {col.id: col.title for col in sheet.columns}
        # rows = sheet id, row id, cell values or display values
        data_frame = pd.DataFrame([[sheet.id, row.id] +
                                   [cell.value if col_map[cell.column_id] in self._cfg['value_cols']
                                    else cell.display_value for cell in row.cells]
                                   for row in sheet.rows],
                                  columns=['Sheet ID', 'Row ID'] +
                                          [col.title for col in sheet.columns])
        del sheet, col_map
        gc.collect()  # force garbage collection
        return data_frame

    def get_report_as_dataframe(self, report_id):
        rprt = self.ss.Reports.get_report(report_id, page_size=0)
        page_count = int(rprt.total_row_count/10000) + 1
        col_map = {col.virtual_id: col.title for col in rprt.columns}
        data = []
        for page in range(1, page_count + 1):
            rprt = self.ss.Reports.get_report(report_id, page_size=10000, page=page)
            data += [[row.sheet_id, row.id] +
                     [cell.value if col_map[cell.virtual_column_id] in self._cfg['value_cols']
                      else cell.display_value for cell in row.cells] for row in rprt.rows]
            del rprt
        data_frame = pd.DataFrame(data, columns=['Sheet ID', 'Row ID']+list(col_map.values()))
        del col_map, page_count, data
        gc.collect()
        return data_frame

This adds additional columns for sheet and row IDs so that I can write back to Smartsheet later if needed.

2

Sheets cannot be retrieved by name, as you've shown in your example code. It is entirely possible for you to have multiple sheets with the same name. You must retrieve them with their sheetId number.

For example:

 sheet = smartsheet_client.Sheets.get_sheet(4583173393803140)  # sheet_id

http://smartsheet-platform.github.io/api-docs/#get-sheet

Smartsheet sheets have a lot of properties associated with them. You'll need to go through the rows and columns of your sheet to retrieve the information you're looking for, and construct it in a format your other system can recognize.

The API docs contain a listing of properties and examples. As a minimal example:

for row in sheet.rows:
    for cell in row.cells
        # Do something with cell.object_value here
Software2
  • 2,358
  • 1
  • 18
  • 28
  • Get_sheet is not a function, and I don't have multiple sheets with the same name. However, I do know how to retrieve the sheet by ID, which in my case is just as unique as the sheet name, but I dont know how to go through the columns and rows to retrieve the information. Can you provide an example loop? Thanks – Nottingham_Python Dec 06 '19 at 19:38
  • Python is case sensitive. In your example, you've used `sheets` instead of `Sheets` before calling `get_sheet`. – Software2 Dec 06 '19 at 19:41
  • Example added. The criteria for how you manipulate the object value is dependent on your specific needs. – Software2 Dec 06 '19 at 20:12
  • To get the sheet id open the sheet in smartsheet, then click on file-> properties and on the popup you should see the sheet Id. – Etienne Mermillod Dec 09 '19 at 17:59
1

Get the sheet as a csv: (https://smartsheet-platform.github.io/api-docs/?python#get-sheet-as-excel-pdf-csv)

smartsheet_client.Sheets.get_sheet_as_csv(
  1531988831168388,           # sheet_id
  download_directory_path)

Read the csv into a DataFrame: (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)

pandas.read_csv
  • This is great in one way because I can downlead the smart sheets i need to any folder then go back an load in the latest venison. Side note it does not load in a data frame but instead saves the Smartsheet as a file in the folder you specify. Still great to know, thank you. – JQTs Mar 23 '22 at 18:06
0

You can use this library

Very easy to use and allows Sheets or Reports to be delivered as a Dataframe.

pip install smartsheet-dataframe

Get a report as df

from smartsheet_dataframe import get_as_df, get_report_as_df
    
df = get_report_as_df(token='smartsheet_auth_token',
                      report_id=report_id_int) 

Get a sheet as df

from smartsheet_dataframe import get_as_df, get_sheet_as_df

df = get_sheet_as_df(token='smartsheet_auth_token',
                     sheet_id=sheet_id_int)

  • replace 'smartsheet_auth_token' with your token (numbers and letters)
  • replace sheet_id_int with your sheet/report id (numbers only)
Cam
  • 1,263
  • 13
  • 22