I've been attempting to stream data from the below api and finding very little success.
https://dev.socrata.com/foundry/data.cityofchicago.org/8v9j-bter
- Automating a datalab notebook with a shell script is too finnicky
- Using airflow to orchestrate is too finnicky as well
- The below code worked in a datalab notebook, but don't know if the "Context" magic command will work in a regular script.
- Is this even possible in appengine?
- Can someone provide guidance on the other scripts necessary for this run properly?
- Indents for code may be off
main.py script
#install main packages
!pip install sodapy
import pandas as pd
from sodapy import Socrata
from google.datalab import Context
#put into dataframe
client = Socrata("data.cityofchicago.org", None)
results = client.get("8v9j-bter", limit=2000)
results_df = pd.DataFrame.from_records(results)
#flow into BigQuery
results_df.to_gbq('chicago_traffic.demo_data', Context.default().project_id,
chunksize=2000, verbose=True, if_exists='append')
App.yaml script
runtime: python27
api_version: 1
threadsafe: true
handlers:
- url: /.*
script: main.app
cron.yaml script
cron:
- description: "append traffic data"
url: /.*
target: main
schedule: every 1 mins
retry_parameters:
min_backoff_seconds: 2.5
max_doublings: 5
requirements.txt
pandas==0.22.0
sodapy==1.4.6
datalab==1.1.2
google-api-python-client