-3

I would like to make a table where I would have a specific order of rows to make it easier to access them individually and as fast as possible. To achieve this I have created the index column and made it a PRIMARY KEY, NOT NULL and UNIQUE. I am not sure that the last two will meke it faster, but feel free to correct me. This is where the problems start to arise. Whenever I delete any record from this table, the indexing will be destroyed, and this is a big issue, becuse in my code I rely on the fact that the next row has an index that is larger by one. To battle this, I have attempted to use the AUTOINCREMENT keyword, but it did not work as planned. Here the code which shows what I mean:

import sqlite3

con = sqlite3.connect('test_db.db')

con.execute('''
    CREATE TABLE IF NOT EXISTS epic_table (
        'index'        INTEGER NOT NULL UNIQUE PRIMARY KEY AUTOINCREMENT,
        city           TEXT NOT NULL,
        avg_salary     FLOAT
    );
''')

con.execute('INSERT INTO epic_table VALUES (0, "Chicago", 12.0)')
con.execute('INSERT INTO epic_table VALUES (1, "New York", 9.11)')
con.execute('INSERT INTO epic_table VALUES (2, "Detroit", 0.19)')

print(con.execute('SELECT * FROM epic_table').fetchall(), '\n')

con.execute('DELETE FROM epic_table WHERE `index` = 1')

print(con.execute('SELECT * FROM epic_table').fetchall())

As an output I get:

[(0, 'Chicago', 12.0), (1, 'New York', 9.11), (2, 'Detroit', 0.19)] 

[(0, 'Chicago', 12.0), (2, 'Detroit', 0.19)]

As you can see, Detroit should have had the index 1, but it hasn't updated.

I could not find any way of fixing this, so I am asking this here, but if you know how to approach this problem in a different way, I am open for suggestions, after all, the only thing that matters is the result, not the means.

user9102437
  • 600
  • 1
  • 10
  • 24
  • 2
    Why do you need to "rely on the fact that the next row has an index that is larger by one"? What does the index actually *represent* in your program? Is it some kind of ranking, or what exactly? You need to explain this much more clearly before a better database structure can be suggested. – ekhumoro Aug 21 '22 at 13:05
  • @ekhumoro It is kind of ranking. In my program I have to use the rows in a specific order. Since the database is large, I would like to avoid, for example, storing a primary key values in a list (in the order that I want) and looking for the next element each time. Instead, I have a "counter" to which I add 1 every time I need the next row. – user9102437 Aug 21 '22 at 13:23
  • 2
    Yet another https://xyproblem.info/ – David דודו Markovitz Aug 21 '22 at 13:24
  • @DavidדודוMarkovitz I have clearly defined all the requirements, and the "Attempt" is something that is always encouraged by the memebers of this website. – user9102437 Aug 21 '22 at 13:29
  • 1
    You have clearly described your attempted solution, which you fail to implement. Not a word on the actual scenario. A classic xyproblem. – David דודו Markovitz Aug 21 '22 at 13:38
  • 1
    @user9102437 For a ranking, the *exact* values aren't important - all that matters is that the overall order is preserved. You should not make that column the primary key. However, you *should* create an index for it, so you can efficiently sort the rows by rank. If the column has an index, finding the next largest value will be very fast. – ekhumoro Aug 21 '22 at 13:49
  • @ekhumoro Index sounds like a great solution. Could you give an example of how this would work? I have never used them. – user9102437 Aug 21 '22 at 14:08
  • @DavidדודוMarkovitz Look, even if you couldn't make out what I wanted, your comment does not add anything to the discussion. If you have any question in particular, please, feel free to ask them. – user9102437 Aug 21 '22 at 14:15
  • 2
    @user9102437 It's still not clear what you're trying to achieve. What exactly does "kind of ranking" mean? A real ranking implies that the positions can change (e.g. as in a league table), which is why it seems inappropriate for the column to be used as a primary key. But really, without more details, it's very difficult to give more concrete advice. – ekhumoro Aug 21 '22 at 15:45
  • @ekhumoro Perhaps it was wrong of me to call it that, yeah. To make this as clear as possible, what I am trying to achieve here is an array or list (as suggested by many people here). Basically, I would like to have an index for each element and be able to access it, while also being sure that if the length is 10, for example, I will have indexes 0 through 9 – user9102437 Aug 21 '22 at 17:07
  • 1
    @user9102437 Sorry, but that explains nothing at all. Why do you think you need this "list"? How is it actually used within the program? What *specific* problem are you trying to solve? – ekhumoro Aug 21 '22 at 17:24
  • @ekhumoro The scenario here is having a bunch of rows with login data, and having to check it once in a while. You don't want to do that often, so they have a cooldown. Having them arranged in an order, allows to plan the checking interval so that by the time the program finishes with the last row, the cooldown for the first is already finished. I could check the time all the time, but I am afraid that will be slow. – user9102437 Aug 21 '22 at 21:16
  • 1
    @user9102437 I don't see why that requires a numbered list. A [timestamp](https://stackoverflow.com/q/14461851/984421) would seem more appropriate. Then you can simply select all rows where `timestamp + cooldown < now`. Your fears about things being "slow" seem based on wrong assumptions and/or premature optimisation. If the column has a UNIQUE contraint, sqlite will automatically create an index for it. A query like the one suggested above will only take a few milliseconds. For a timeout mechanism, see e.g. [threading.Timer](https://docs.python.org/3/library/threading.html#timer-objects). – ekhumoro Aug 22 '22 at 13:59
  • @ekhumoro I see! This is the kind of the answer I was looking for. So, you are saying that a query looking like `where timestamp + cooldown < now` is essentially O(1) because of indexing? This seems like a great solution! – user9102437 Aug 23 '22 at 12:31
  • @user9102437 If a column has an index, binary search can be used, which is obviously much faster than a full table scan. See [SQLite: Query Planning](https://www.sqlite.org/queryplanner.html). I suggest you install a utility like [sqlitebrowser](https://sqlitebrowser.org/) so you can test queries on your database and get accurate timings. – ekhumoro Aug 23 '22 at 13:10

2 Answers2

-1

You treat a relational table as an ARRAY of rows. This is very wrong. A relational table is a SET of rows. The primary means to retrieve individual rows from such a table is via the primary key. But you should never treat integer primary key as a relative position of the row within the table. In fact, even if your primary key is an integer and it covers a contiguous range, interpreting it as an array index is still completely meaningless. There exist special databases for analytical applications (OLAP), which store data column-wise and possibly in contiguous areas. For those databases, interpreting a "contiguous" integer PK as an array index may have at least some sense. But SQLite is not such a database. You need to learn more about RDBMSs and rethink how you access your data. Judging about your knowledge level based on your question, you should not be concerned about performance, you have to learn quite a bit more before thinking about it.

PChemGuy
  • 1,582
  • 3
  • 6
  • 18
-2

When a field is defined as a primary key, that implies that it is the unique identifier of the row, and this is definitely the expected behavior of a primary key. Primary keys are expected to not change, so they can identify one (and only one) row forever.

Using AUTOINCREMENT simply lets you create new rows without explicitly defining the "index" column (which, as a primary key, is required for new rows). The way AUTOINCREMENT works is by keeping track of a counter, which it uses to auto-ID the next row.

I would like to make a table where I would have a specific order of rows to make it easier to access them individually and as fast as possible.

The current system (using a primary key) can easily satisfy this purpose. Even with missing numbers throughout the table, you retain the order of rows as initially specified. And you can still access them individually using their primary key, as intended.

To access the "next" row in a table, you can use a limit + offset combo. For example,

SELECT * FROM epic_table 
LIMIT 1 OFFSET 2 
ORDER BY 'index' ASC;

This will allow you to increment a counter and access the n-th element in your table, instead of having to mess with primary keys.

Krishnan Shankar
  • 780
  • 9
  • 29
  • That is not what `AUTOINCREMENT` is for. Its purpose is to alter the algorithm for assigning integer primary keys so that IDs are never reused. The default behaviour (which is more efficient) is to either reuse IDs or choose a value one more than the largest. – ekhumoro Aug 21 '22 at 13:24
  • What I understood is that I don't have to use neither `UNIQUE` nor `NOT NULL`. Is that correct? Also, while, yes, I will not have a problem with specific order, I will have an issue with having the difference be 1 between all concurrent rows, which is one of the stated requirements for me. – user9102437 Aug 21 '22 at 13:27
  • 2
    @user9102437 May I ask why exactly you want the difference in IDs between concurrent rows to be 1? Generally, the Primary Key is not used for displaying information. It is solely as a unique identifier, so you should reconsider why you want IDs to be one number apart instead of trying to change Primary Key values. – Krishnan Shankar Aug 21 '22 at 13:31
  • @KrishnanShankar It is true that my application is rather uncommon, but in my program I have to use the rows in a specific order. Since the database is large, I would like to avoid, for example, storing a primary key values in a list (in the order that I want) and looking for the next element each time. Instead, I have a "counter" to which I add 1 every time I need the next row. – user9102437 Aug 21 '22 at 13:34
  • @user9102437 I've edited the answer with how I would go about accessing the n-th row in the table. Hope it helps! – Krishnan Shankar Aug 21 '22 at 13:44
  • @KrishnanShankar Thank you! This is what I used to use, except for the `ORDER` part, because I was aftaid that It will be slow to sort every time I look for a row. Does the `primary key` attribute fix this? – user9102437 Aug 21 '22 at 14:11
  • "LIMIT 1 OFFSET XYZ" is a bad idea. The engine must scan the index to do OFFSET XYZ. https://use-the-index-luke.com/sql/partial-results/fetch-next-page – PChemGuy Aug 21 '22 at 14:41
  • @user9102437 `primary key` doesn't fix this... it *will* be slow. And like @PChemGuy mentioned, there are drawbacks. However, the alternate solution is to have another column for something like "position," which would need to be updated in bulk whenever an entry is added/deleted (similar to array management in Java). Therefore, I felt like this is the best possible way to accomplish what the OPs criteria are. – Krishnan Shankar Aug 21 '22 at 15:13
  • Some other possible ways of accomplishing this: 1) Save the query to memory (as a variable) using something like https://gitlab.com/sgrignard/sqlite_numpy, which allows you to "convert" a table snapshot into a more desirable format for your use-case. 2) Use a different database entirely (even something as simple as a CSV file could fulfill your use-case better, although something more production-ready might be preferrable). – Krishnan Shankar Aug 21 '22 at 15:17