0

I am new to Datalad. I am trying to achieve version history and commit details for every person who is doing any changes to my Datalad dataset.

For far, I am able to create a sibling of my local dataset to a cloud storage bucket and able to export the Datalad dataset to GCS bucket/Datalad sibling.

What I am trying to achieve here is the below points: -

  1. where ever some files get changed to my Datalad directory a commit should be able to capture the user details.

Currently, it captures the git config details of my that I set during the git installation. Is there a way to dynamically pass these values using Datalad while doing a commit?

  1. I don’t want my local disk to maintain the history of the files, just the metadata, version history I want to store it on a GCS bucket.

Currently, I am able to push all the files/ folder (except the .git folder which contains history) to GCS sibling using git-annex export command. Is there a way to push the version history to GCS bucket and get insight from there instead of storing everything locally?

  1. Also, most of the commands I am using are the git-annex commands. Is there a Datalad API present for the same operations?

Any insights will be helpful.

Buckors
  • 40
  • 7
Arvind Sharma
  • 91
  • 1
  • 7
  • Datalab isn't the most modern tools on Google Cloud. AI Notebooks are the most recommended now. Anyway, before answering your question, I would like to know why are you doing this? Why a simple commit isn't enough? – guillaume blaquiere Nov 09 '20 at 08:20
  • I think that you should focus a little bit more on your question and try to be more concrete or post multiple questions on multiple posts instead of having multiple questions here. Please give a look to the [how to ask section](https://stackoverflow.com/help/how-to-ask) – Chris32 Nov 19 '20 at 11:40

1 Answers1

0

As I understand that a Datalad history file is a text file, I can say for your third question that you can consume a txt file from Cloud Storage without the need of downloading it locally. You can achieve this by accessing the file using the storage URL, i.e.: "https://storage.cloud.google.com/{MyBucket}/{MytxtFile}.txt"

From here you will be able to get the text content dynamically, i.e. making a GET request will return the file content.

Now, it would be useful if you share with us an example of what do you want to achieve exactly, i.e what commands you are using. As per the Datalog get documentation it seems like it expects a local file and I'm not sure if you could make it work without a local file (through a curl)

A possible midway solution between using Cloud Storage or local files could be using Cloud Storage FUSE so you can mount your Cloud Storage buckets as file systems on Linux or macOS systems, you can manipulate and access your files locally and this changes will be reflected in the bucket.

Chris32
  • 4,716
  • 2
  • 18
  • 30
  • Sorry, I was not keeping track of my emails. I am trying to get the metadata of a file that is present in the cloud storage bucket. This Metadata will be used by Data Catalog. By default, after creating a file set, we get the metadata of the files in Data Catalog, but we cannot get other details as who has updated the file and what time file got updated and what are the changes happened to files present in cloud storage bucket. For this I want to use datalad which will store the history of these file and later on push these metadata about the file to data catalog – Arvind Sharma Nov 20 '20 at 10:22
  • Please have a look into the following [Cloud Storage Official Documentation](https://cloud.google.com/storage/docs/viewing-editing-metadata) which provides different ways to view and edit your object metadata in Google Cloud Storage. Also please have a look into the following Stackoverflow posts [1](https://stackoverflow.com/questions/49683255/how-to-get-file-metadata-from-gcs), [2](https://stackoverflow.com/questions/56366142/how-to-access-file-metadata-for-files-in-google-cloud-storage-from-a-python-go). – Nibrass H Dec 02 '20 at 14:53