I have this working GSQL script:
use graph test_graph
drop job test_job
drop data_source deep_disco_test
create data_source S3 deep_disco_test for graph test_graph
set my_source = "/home/ubuntu/s3.config"
CREATE LOADING JOB test_job for GRAPH test_graph {
DEFINE FILENAME vertex1= "$deep_disco_test:{\"file.uris\":\"s3://signals-output/db_to_tg/identities/test_identities.parquet/\",\"file.reader.type\": \"parquet\",\"file.regexp\":\".parquet\"}";
DEFINE FILENAME edge1= "$deep_disco_test:{\"file.uris\":\"s3://signals-output/db_to_tg/relations/test_relations.parquet/\",\"file.reader.type\": \"parquet\",\"file.regexp\":\".parquet\"}";
LOAD vertex1 to VERTEX identity VALUES($"dd_id",$"names",$"identity_type") USING JSON_FILE = "true";
LOAD edge1 to EDGE relation VALUES($"src",$"dst",$"edge_type") USING JSON_FILE = "true";
}
run loading job test_job using EOF = "true"
Now i want to be able to run this from databricks using PyTigerGraph conn
class and conn.uploadFile()
.
This are my steps:
- create string to create two loading jobs:
load_job = f'''
use graph {graph}
drop job {test_load_vertices}
drop job {test_load_edges}
drop data_source {data_source}
create data_source S3 {data_source} for graph {graph}
set {data_source} = "/home/ubuntu/s3.config"
CREATE LOADING JOB {test_load_vertices} FOR GRAPH {graph} {{
DEFINE FILENAME MyDataSource;
LOAD MyDataSource to VERTEX identity VALUES($"dd_id",$"names",$"identity_type") USING JSON_FILE = "true";}}
CREATE LOADING JOB {test_load_edges} FOR GRAPH {graph} {{
DEFINE FILENAME MyDataSource;
LOAD MyDataSource to EDGE relation VALUES($"src", $"dst", $"edge_type") USING JSON_FILE = "true";}}
'''
- load the jobs
conn.gsql(load_job)
Which outputs the following:
"Using graph 'dd_subgraph_test'
Successfully dropped jobs on the graph 'dd_subgraph_test': [test_load_vertices_job].
Successfully dropped jobs on the graph 'dd_subgraph_test': [test_load_edges_job].
Successfully dropped data sources: [deep_disco_test].
Successfully created data sources: [deep_disco_test].
Data source 'deep_disco_test' has been updated.
Successfully created loading jobs: [test_load_vertices_job].
Successfully created loading jobs: [test_load_edges_job]."
- Test run one of the loading jobs:
print(conn.uploadFile(
filePath="s3://signals-output/db_to_tg/identities/test_identities.parquet",
fileTag='MyDataSource',
jobName="test_load_vertices_job"))
Which prints None
.
Also, in graphstudio nothing happens.
What am i doing wrong?