0

Within my sqlite3 database there are hundreds of tables, but what I would like to do is create a dataframe which appends only tables from the database that match the names contained in separate list that I have made

The list is called 'col_list' and currently contains only 3 elements (3 names)

col_list = df['ref_name'].tolist()

My attempt so far has lead me to the following, which is very cumbersome. :

conn = sqlite3.connect('all_data.db')
query = "SELECT * FROM " + col_list[0] + ";"
df = pd.read_sql_query(query, conn)

conn = sqlite3.connect('all_data.db')
query = "SELECT * FROM " + col_list[1] + ";"
df1 = pd.read_sql_query(query, conn)
df2 = df.append(df1)

conn = sqlite3.connect('all_data.db')
query = "SELECT * FROM " + col_list[2] + ";"
df3 = pd.read_sql_query(query, conn)
df4 = df2.append(df3)

df4 = df4.sort_values(by = 'date')
df4 = df4.reset_index(drop=True)

The number of elements in the 'col_list' can vary, which based on my current code structure means rewriting the code each time that this happens. Ultimately I would like to be able to have this all work as a 'for' loop and therefore look to you guys for help.

Thank you for taking the time to read this.

windwalker
  • 359
  • 4
  • 14
  • *there are hundreds of tables* ... sounds like a database design question. If tables have same structure designated by a number or date suffix, consider a *single* table for a normalized, relational model. Then, no pandas appending needed. – Parfait Oct 24 '20 at 15:39
  • @Parfait Thanks, but I don't want to append all the tables from the database, only those that are contained in the list that I generate from a different data source – windwalker Oct 24 '20 at 17:12
  • 1
    My suggestion goes beyond your immediate question for best practice using relational databases. Today you use SQLite but tomorrow an enterprise, server RDBMS like Postgres. Databases should not be dump files of data pieces but follow a [normalized structure](https://en.m.wikipedia.org/wiki/Database_normalization). Just saving you complex querying and data wrangling down the road. Good luck and happy coding! – Parfait Oct 25 '20 at 01:36
  • @Parfait. I am very new to coding and databases, so very much appreciate all the help that I get here. You guys make this such a wonderful community. that Wikipedia link is very insightful – windwalker Oct 25 '20 at 08:13

1 Answers1

1

If I understood your question correctly, you want to do something like this?

df_all = None
conn = sqlite3.connect('all_data.db')
for col in col_list:
    query = "SELECT * FROM " + col + ";"
    df = pd.read_sql_query(query, conn)
    if df_all is not None:
        # See also @Parfait's comment below
        # about performance cost of append()
        df_all = df_all.append(df)
    else:
        df_all = df
conn.close()
df_all = df_all.sort_values(by = 'date')
df_all = df_all.reset_index(drop=True)
jurez
  • 4,436
  • 2
  • 12
  • 20
  • thank you for helping. Yes something like your loop is what I am looking. However, I get an error message ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). which is being caused by.. if df_all: – windwalker Oct 24 '20 at 17:33
  • I have just checked again and the edit you made only produces dataframe that contains one of the tables. nothing is being appended – windwalker Oct 24 '20 at 17:47
  • I cannot run the code on your computer. Put a `print(df)` before `df_all.append(df)` to see what is going on. – jurez Oct 24 '20 at 20:37
  • Hiya, I managed to resolve with small amendment: if df_all is not None: df_all = df_all.append(df) – windwalker Oct 24 '20 at 21:01
  • Ah, I missed that. Thanks! – jurez Oct 24 '20 at 21:05
  • 1
    [Never call DataFrame.append or pd.concat inside a for-loop. It leads to quadratic copying.](https://stackoverflow.com/a/36489724/1422451). Use a list or dictionary instead and then `concat` **once** outside the loop. – Parfait Oct 25 '20 at 01:24
  • @Parfait Great point, thanks! I added a remark for all future readers. – jurez Oct 25 '20 at 06:03
  • @Parfait point noted. I am getting to grips with this now – windwalker Oct 25 '20 at 08:15