-1

I have a list of integers but would like to turn them into single-item tuples for a SQL statement. I am struggling at trying to get integers into single tuple-like structures within a large string.

The ideal goal is to generate a chunk_size of queries following this format below. So ideally I'd want to end up with an iterable of insert statements which have a chunk_size of x values being inserted in each statement.

# The list of integers is like the following
# Approximate size n = 1M+
li: list[int] = [0,1,2,3,4,5,6]

Desired result:

# I'd like to have an iterable of statements like this (of a certain chunk_size (x) specified by the user)
queries: list[str] = [
    'insert into test_table (test_col) values (0),   (1),   (2),   (3),  (4)...   (x)',
    ...
    'insert into test_table (test_col) values (x+1), (x+2), (x+3), (x+4) (x+5)... (n)'
]

Current attempt:

import itertools

# The list of integers is like the following
# Approximate size n = 1M+
li = [0,1,2,3,4,5,6,7,8,9,10]

# Established chunked queries
def grouper(n, iterable) -> tuple:

    it: iter = iter(iterable)
    while True:
        chunk: tuple = tuple(itertools.islice(it, n))
        if not chunk:
            return
        yield chunk

queries = [f"insert into testing (testing_col) values {i};" for i in grouper(2,li)]
queries[0]

Problem is that I need to get the strings set so that that SQL engine understands single item tuples like (0), (1), etc.

Coldchain9
  • 1,373
  • 11
  • 31
  • 1
    Wait, so the goal is **not** actually to make a list of tuples of 1 element, but to *write variable contents into a string*, with parentheses surrounding each thing that is written? – Karl Knechtel Oct 08 '22 at 23:54
  • @KarlKnechtel yes. that would suffice. Given that SQL is inserting a large amount of tuples after the values keyword in the query, tuples seemed to be the most straightforward attempt. I am just trying to dynamically generate string SQL insert statements in a particular fashion as shown in the desired result. – Coldchain9 Oct 08 '22 at 23:56
  • But this is a good time to consider an exercise in logic, and in **reading explanations** of solutions found on the Internet. "Current attempt:" I see here some commonly offered code to solve a common problem, of splitting a list into chunks. I see that a "problem" is described where the strings are not "single item tuples". I presume that you tried this code and saw tuples with two items instead. So - see where the code says `grouper(2,li)`? Did you try changing the `2` to a `1`? – Karl Knechtel Oct 08 '22 at 23:56
  • @KarlKnechtel The idea is not to repeat ```insert into``` for every single element. Rather, insert into once, with values of size x in the string – Coldchain9 Oct 08 '22 at 23:58
  • 2
    Yes, I understood that. But that approach is a red herring anyway. There is a much more important problem here: **do not** use any kind of string formatting to create a SQL query - it is error-prone and can create a **critical security risk** if there is any chance that any of the data could ever be under the user's control. I gave the duplicate that explains how to format queries in general. The simplest solution here is to just run the query in a loop, although I can see why that might not be desirable. – Karl Knechtel Oct 09 '22 at 00:00
  • Oh, I found a duplicate for the specific problem of inserting multiple rows efficiently. The trick is to use the `.executemany` method of the query. – Karl Knechtel Oct 09 '22 at 00:02
  • @KarlKnechtel with the answer provided by Mikael Öhman I was just able to insert 10M rows into SQL in 4 seconds. The overall premise here was to generate these unique insert statements in an iterable with which I use multiprocessing to insert them into the db. – Coldchain9 Oct 09 '22 at 00:04
  • 1
    @Coldchain9, doing this with bind variables is not only more secure, it's also faster; done correctly, the database doesn't need to parse SQL statements over and over: you generate a prepared statement just once, and then provide a batch of data to use with it. That's one of the main advantages of `executemany`. – Charles Duffy Oct 09 '22 at 00:07

1 Answers1

2

The tuple is a red herring from what i can gather. What you seem to want is a string that happens to have parenthesis around each term.

def process_chunk(chunk):
    return 'insert into test_table (test_col) values ' + ', '.join(f'({x})' for x in chunk) + ';'

def process_all(data, n):
    return [process_chunk(data[i:i+n]) for i in range(0, len(data), n)]
Mikael Öhman
  • 2,294
  • 15
  • 21
  • You are correct that it is a red herring, but I am struggling at getting single parentheses ```()``` around each integer for each insert statement of proper chunk size. Your result is close but results in something similar to what I have in my now edited "attempt". E.g. each statement has brackets within the ```values``` statement which is not what SQL will accept. – Coldchain9 Oct 08 '22 at 23:53
  • 1
    Fixed a typo in my code. Should work now. Basically, you sometimes just have to write the string formatting yourself, term by term. There is no guarantee that whatever type you have will have a direct string conversion that happens to be exactly what you need. – Mikael Öhman Oct 08 '22 at 23:59
  • Thanks for the answer. This worked splendidly and assisted in speeding up my query from using a Pandas dataframe insert (~2 minutes for 10m) rows to 4 seconds. Thank you! – Coldchain9 Oct 09 '22 at 00:11