0

I have almost no knowledge of sql at all, and I'm trying to load a db file using pandas' read_sql method. In the accepted answer How to open and convert sqlite database to pandas dataframe, it says to use

import sqlite3
import pandas as pd
# Create your connection.
cnx = sqlite3.connect('file.db')

df = pd.read_sql_query("SELECT * FROM table_name", cnx)

but how do I find out what the table_name is for the table in my db file?

David
  • 619
  • 2
  • 8
  • 15
  • Don't you know the table you want to load? You can use `sqlitebrowser` to explore your database. – Corralien Apr 20 '23 at 18:57
  • @Corralien I only know the filename. I don't really know what's inside the table – David Apr 20 '23 at 19:01
  • 1
    So use [`sqlitebrowser`](https://sqlitebrowser.org/dl/) to browse the database. There are many other tools (more complicated) like DBeaver or HeidiSQL but `sqlitebrowser` is a good start. I also like [`sqlitestudio`](https://sqlitestudio.pl/) – Corralien Apr 20 '23 at 19:03
  • @Corralien ah so this is for work and I'm on a work computer that doesn't have sqlitebrowser, and the db file is located on a remote server that I don't have download privileges to. I access it using VS Code, do you know if there's an easy way to view it from VSCode? – David Apr 20 '23 at 19:06
  • https://learn.microsoft.com/en-us/sql/tools/visual-studio-code/sql-server-develop-use-vscode?view=sql-server-ver16 seems there might be an extension that I can trry – David Apr 20 '23 at 19:06

2 Answers2

0

You can query the tables contained in the database:

SELECT name FROM sqlite_schema WHERE type='table' ORDER BY name

If the data you need isn't stored in a single table, you can write a query to join the tables you need together

Hedscan
  • 175
  • 1
  • 6
0

To explore quickly with Pandas your database, you can use this code:

tables = pd.read_sql_query("SELECT * FROM sqlite_schema WHERE type='table'", cnx)
for tbl in tables['name']:
    print(f"{'*'*10} {tbl} {'*'*10}")
    columns = pd.read_sql_query(f"PRAGMA table_info({tbl})", cnx)
    print(columns, end='\n\n')
    data = pd.read_sql_query(f"SELECT * FROM {tbl} LIMIT 5", cnx)
    print(data, end='\n\n')

Sample output:

********** UserConstraint **********
   cid  name          type  notnull dflt_value  pk
0    0    ID       INTEGER        1       None   1
1    1  Name  VARCHAR(255)        0       None   0

   ID     Name
0   0     <NA>
1   1  INSTCAP
2   2  LUMPINV
3   3      INV
4   4     COST

********** Cap_New **********
   cid            name      type  notnull dflt_value  pk
0    0              ID   INTEGER        1       None   1
1    1        Scenario  SMALLINT        0       None   0
2    2         Process  SMALLINT        0       None   0
3    3          Period  SMALLINT        0       None   0
4    4          Region  SMALLINT        0       None   0
5    5         Vintage  SMALLINT        0       None   0
6    6  UserConstraint  SMALLINT        0       None   0
7    7              PV     FLOAT        0       None   0

   ID  Scenario  Process  Period  Region  Vintage  UserConstraint            PV
0   1         1      211       1       1        1               1    296.071141
1   2         1      212       1       1        1               1     11.770909
2   3         1      343       1       1        1               1  11851.674497
3   4         1      949       2       1        2               1     43.000000
4   5         1      691       2       1        2               1      5.488316

Note: don't forget to print tables.

Corralien
  • 109,409
  • 8
  • 28
  • 52