0

All explained above is in the context of an ETL process. I have a git repository full of sql files. I need to put all those sql files (once pulled) into a sql table with 2 columns: name and query, so that I can access each file later on using a SQL query instead of loading them from the file path. How can I make this? I am free to use the tool I want to, but I just know python and Pentaho.

Maybe the assumption that this method would require less computation time than simply accessing to the pull file located in the hard drive is wrong. In that case let me know.

Javier Lopez Tomas
  • 2,072
  • 3
  • 19
  • 41

1 Answers1

1

You can first define the table you're interested in using something along the lines of (you did not mention the database you are using):

CREATE TABLE queries (
  name  TEXT PRIMARY KEY,
  query TEXT
);

After creating the table, you can use perhaps os.walk to iterate through the files in your repository, and insert both the contents (e.g. file.read()) and the name of the file into the table you created previously.

It sounds like you're trying to solve a different problem though. It seems like you're interested in speeding up some process, because you asked about whether accessing queries using a table would be faster than opening a file on disk. To investigate that (separate!) question further, see this.

I would recommend that you profile the existing process you are trying to speed up using profiling tools. After that, you can see whether IO is your bottleneck. Otherwise, you may do all of this work without any benefit.

As a side note, if you are looking up queries in this way, it may indicate that you need to rearchitect your application. Please consider that possibility as well.

septimir
  • 41
  • 2