Suppose I am working with a large database showing details on books from an entire library. I want to get various statistics of the collection. For example, in this code, I am defining a method to get the top 10 most represented authors.
def most_owned_authors():
db = 'database.db'
conn = sqlite3.connect(db)
cursor = conn.cursor()
cursor.execute('''SELECT AUTHOR AS author FROM MAINTABLE WHERE OWNEDCHECKBOX = TRUE;''')
authors_df = pd.DataFrame(cursor.fetchall())
authors = []
author_dict = {}
for x in authors_df.iloc:
authors.append(x.to_string(index=False))
for x in authors:
amount = authors.count(x)
author_dict[x] = amount
author_dict = dict(sorted(author_dict.items(), key=lambda item: item[1], reverse=True))
top_10_owned_authors = {}
for x, k in enumerate(author_dict):
if x == 10: break
top_10_owned_authors[k] = author_dict[k]
Is there an easier way, using SQLite3 and Pandas to generate statistics from an SQL Inquiry? Or do I have to manually create the logic in a way such as above? Can SQL SUM/COUNT across multiple instances of the same entry, and then just pull indices from the DataFrame? An additional example of the same idea would be to use SQL and Pandas to generate a DataFrame of books marked as "read" and the "year" they were read.