0

I have a problem on google-BigQuery:

  1. I wrote a query.
  2. I open the shell and I create a Python script that call the query and save the data in CSV file in a subdir of my home on BigQuery, and so far so good.

Now, all the files (.py and .csv) are that directory of BigQuery:
stefano_giostra@[MY_PROJECT]:~/bb_stkovf_data$ ls -l total 24 -rw-r--r-- 1 stefano_giostra stefano_giostra 2724 Aug 7 01:51 best_answer_on_google-bigquery_90gg_20180807_0151.log -rw-r--r-- 1 stefano_giostra stefano_giostra 669 Aug 7 01:51 domande_risposte_2014_2015.log -rw-r--r-- 1 stefano_giostra stefano_giostra 155 Aug 7 01:51 python_developer_by_nation_20180807_0151.log -rw-r--r-- 1 stefano_giostra stefano_giostra 770 Aug 7 01:51 query.sql -rw-r--r-- 1 stefano_giostra stefano_giostra 7744 Aug 7 01:50 sg_bb_stkorw_info.py

  1. I wish to see the CSV on the BigQuery or GCP UI, how can I?
  2. I wish also generate a job that simply calls the script, and I don't wish to build ad HTML application only to schedule a call of a query.

In the documentation I haven't seen any yaml file where there is a call of a .py but only of a .app and in all documentation that I'd read in these days I haven't seen how to do that.

Can someone help me?

Thanks

Stefano G.
  • 143
  • 2
  • 14
  • Can you clarify where are you saving the CSV file and in which UI do you want to see it, please? – Rubén C. Aug 07 '18 at 10:48
  • Hi Ruben, is it clearer now? ;) – Stefano G. Aug 07 '18 at 15:03
  • Looks so clear! But try not to share private stuff like your project/bucket IDs and surname when posting on public forums. I would reedit and use [PROJECT_ID], [BUCKET_NAME] and [USER_ACCOUNT] instead. – Rubén C. Aug 08 '18 at 10:55

1 Answers1

0

Find the answers below:

  1. To visualize the CSV in BigQuery you have to first load it in a table as described here. Then you can visualize it as a regular table (using preview or querying for particular fields).

  2. I found a way from JMD, who is a Technical Solutions Representative in Google, AKA trustworthy source, in this other post. The solution uses a Cron job in a Compute Engine f1-micro instance, which allows you to run the script without modifying it and having to build an 'HTML' application, which you don't want:

To schedule a script you first have to create a project if you don’t have one. I assume you already have a project so if that’s the case just create the instance that you want for scheduling this script.

To create the new instance:

  1. At the Google Cloud Platform Console click on Products & Services which is the icon with the four bars at the top left hand corner.

  2. On the menu go to the Compute section and hover on Compute Engine and then click on VM Instances.

  3. Go to the menu bar above the instance section and there you will see a Create Instance button. Click it and fill in the configuration values that you want your new instance to have. The values that you select will determine your VM instance features. You can choose, among other values, the name, zone and machine type for your new instance.

  4. In the Machine type section click the drop-down menu tab to select an “f1-micro instance”.

  5. In the Identity and API access section, give access scope to the Storage API so that you can read and write to your bucket in case you need to do so; the default access scope only allows you to read. Also enable BigQuery API.

  6. Once you have the instance created and access to the bucket, just create your cron job inside your new instance: In the user account under which the cron job will execute, run crontab -e and edit this file to run the cron job that will execute your baby.sh script.The following documentation link should help you with this.

Please note, if you want to view output from your script you may need to redirect it to your current terminal.

You might need to add more permissions since you are also using Google BigQuery.

Mangu
  • 3,160
  • 2
  • 25
  • 42
Rubén C.
  • 1,098
  • 6
  • 16
  • Hi Rubèn, Maybe I did not say it clear enough. I have a Python script that launch some query on BigQuery and save the output on a csv, and that work fine. Now I need to schedule this work to obtain a daily automatic execution. But on documentation i see that:https://cloud.google.com/appengine/docs/standard/python/config/cron I haven't found how to save files created by script on Google bucket and hot schedule this work. The documentation is very dispersive. – Stefano G. Aug 10 '18 at 07:57
  • I understand. What is not clear for me is where do you have the Python script. I thought about App Engine but nothing referencing it. So I assumed that you have it on your Cloud Shell (my solution is for this use case). If it is in App Engine you need a cron job to schedule tasks. To save the data on BigQuery, add an [extra operation in the cron job](https://cloud.google.com/appengine/docs/flexible/go/scheduling-jobs-with-cron-yaml) or use the same Python script, [waiting for some time](https://stackoverflow.com/questions/510348/how-can-i-make-a-time-delay-in-python) to have the file created. – Rubén C. Aug 10 '18 at 10:34
  • i have the scrip in the cloud shell. The doc talk about an URL, i have a script and I havent an application. So how can I do to tell at YAML to execute my script? This my problem. Thanks – Stefano G. Aug 10 '18 at 15:40
  • @StefanoG. If you don't have an App Engine application, you don't need the YAML file (my guess is that you are speaking of the `cron.yaml` file). You said in your question that you don't want to build an 'HTML' application, but you would need even the smallest, simple one in App Engine, and modify your script a bit, to run it there. I would just go with @RubenC answer. – Mangu Aug 13 '18 at 08:45
  • Hi Ruben, thanks I these days I can't work on this project. When I will came back at home I will try a bold a mini webapp to drive the script – Stefano G. Aug 18 '18 at 06:37