From https://www.w3schools.com/python/python_mysql_getstarted.asp
Installation
python -m pip install mysql-connector-python
Overview
Create a connection:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword"
)
Create a cursor to interact with the connection you made, then create a Database:
mycursor = mydb.cursor()
mycursor.execute("CREATE DATABASE mydatabase")
mydb.close()
After the database has been created, you can start connecting with it like so:
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
Note that you don't have to actually close your connection and reopen it to connect to that database, yet I don't see the docs mentioning anything about interacting with that specific database after you've created it, so I'm going to close it after I create the database...
Create the table with the proper datatypes and constraints:
mycursor.execute("CREATE TABLE customers (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255), address VARCHAR(255))")
Then insert into it and commit the entries:
sql = "INSERT INTO customers (name, address) VALUES (%s, %s)"
val = ("John", "Highway 21")
mycursor.execute(sql, val)
mydb.commit()
Close the connection:
mydb.close()
Your Specific Case
This is the data you have provided:
data = {
"authorId": "1773022",
"url": "https://www.semanticscholar.org/author/1773022",
"papers": [
{
"paperId": "1253d2704580a74e776ae211602cfde71532c057",
"title": "Nonlinear Schrodinger Kernel for hardware acceleration of machine learning"
},
{
"paperId": "71f49f1e3ccb2e92d606db9b3db66c669a163bb6",
"title": "Task-Driven Learning of Spatial Combinations of Visual Features"
},
{
"paperId": "bb35ae8a50de54c9ca29fbdf1ea2fbbb4e8c4662",
"title": "Statistical Learning of Visual Feature Hierarchies"
}
]
}
Granted that I don't know all the details, but based on the data given, I'd assume that you would want a table for:
- Authors - id (auto increment pk), authorId (varchar), url (varchar),
- Papers - id (auto increment pk), authorId (varchar fk), paperId, title (varchar)
Modify this as you please:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
# create the tables if they don't exist
authors = False
papers = False
mycursor.execute("SHOW TABLES")
for e in mycursor.fetchall():
if "authors" == e[0]:
authors = True
if "papers" == e[0]:
papers = True
if not authors:
mycursor.execute("CREATE TABLE authors (id INT AUTO_INCREMENT PRIMARY KEY, authorId VARCHAR(255), url VARCHAR(255))")
if not papers:
mycursor.execute("CREATE TABLE papers (id INT AUTO_INCREMENT PRIMARY KEY, authorId VARCHAR(255), paperId VARCHAR(255), title VARCHAR(255))")
# insert into them - recall that "data" is what gets returned by your api. I am assuming 1 entry.
a = data["authorId"]
u = data["url"]
ps = data["papers"]
# I am going to check if the authorId already exists.
mycursor.execute("SELECT * FROM authors WHERE authorId = '" + a + "'")
as = mycursor.fetchall()
if len(as) == 0:
sql = "INSERT INTO authors (authorId, url) VALUES (%s, %s)"
val = (a, u)
mycursor.execute(sql, val)
mydb.commit()
# for each paper in papers
for p in ps:
# Consider this: Suppose you had multiple authors for 1 paper. Provided that they are entered into the authors table (which they should be), you can add an entry to the papers table for each author.
# check to see if the paper is already in the database with the specific author, paperId, and title
mycursor.execute("SELECT * FROM papers WHERE authorId = '" + a + "' AND paperId = '" + p["paperId"] + "' AND title = '" + p["title"] + "'")
pc = mycursor.fetchall()
# if they are no entries, insert it
if len(pc) == 0:
sql = "INSERT INTO papers (authorId, paperId, title) VALUES (%s, %s, %s)"
val = (a, p["paperId"], p["title"])
mycursor.execute(sql, val)
mydb.commit()
mydb.close()
Side note:
I have not tested this. I will be amazed myself if this works first try. If you come across any errors, I can do my best to trouble shoot.
Pandas Dataframe .to_sql
Method
Just in case this thread blows up with a bunch of answers, this answer on this thread uses a nice method from the pandas dataframe library and instead of plagiarizing it, I'd rather give credit where credit is due. I'd also like to include the docs and this detailed answer. It's probably a lot easier and is going to be a lot closer to an ORM (as I've mentioned below here) than my crude attempt to make one.
My Crude Attempt at an ORM
Once again, I have not tested this. It may not work, but the idea is there.
Please check out this article: What are some good Python ORM solutions?
Save this as Models.py
import mysql.connector
import os
# --------------------------------------------------
# General Model
# --------------------------------------------------
# table columns is a dictionary that looks like:
# table_columns = {"column name": "constaints"}
# ex: table_columns = {"authorId": "VARCHAR(255)", "next col": "VARCHAR(255)"}, etc.
# data is the same as table_columns, but contains the data to be added for 1 row
class Model:
def __init__(self, table_columns={}, table_name=None, data=None, host=os.environ["mysqlhost"], user=os.environ["mysqluser"], password=os.environ["mysqlpassword"], database=os.environ["database"]):
self.db = None
self.cursor = None
self.table_name = table_name
self.table_columns = table_columns
self.data = data
if host is not None and user is not None and password is not None and database is not None:
try:
self.db = mysql.connector.connect(
host=host,
user=user,
password=password,
database=database
)
except Exception as e:
print(e)
#print("One or more credentials were incorrect! Could not connect to you database!")
if self.db is not None:
self.cursor = self.db.cursor()
if table_name is not None:
self.cursor.execute("SHOW TABLES;")
for e in self.cursor.fetchall():
if e[0] != self.table_name:
pk = [[e, self.table_columns[e]] for e in self.table_columns if "primary key" in e.lower()]
if len(pk) == 1:
pk = " ".join(pk) + ", "
del self.table_columns[pk[0]]
else:
pk = ""
try:
table_string = 'CREATE TABLE "' + self.table_name + '"(' + pk + ", ".join([" ".join(['"' + "_".join(c.split()) + '"', self.table_columns[c].upper()]) for c in self.table_columns]) + ');'
self.cursor.execute(table_string)
print("Created table with name: " + self.table_name)
except Exception as e:
self.db.rollback()
print(e)
def insert(self):
if self.data is not None:
pkname = ""
try:
self.cursor.execute('SHOW KEYS FROM "(%s)" WHERE Key_name = (%s);', (self.table_name, 'PRIMARY'))
pkname = self.cursor.fetchall()[0]
if pkname in self.table_columns and pkname not in self.data:
del self.table_columns[pkname]
elif pkname not in self.table_columns and pkname in self.data:
del self.table_columns[pkname]
except Exception as e:
print("Could not get primary key name!")
print(e)
try:
self.cursor.execute('SHOW COLUMNS FROM "' + self.table_name + '";')
self.table_columns = {e: "" for e in self.cursor.fetchall()}
except Exception as e:
self.db.rollback()
print("Could not find table with name " + self.table_name)
print(e)
flag = True
for e in self.data:
if e not in self.table_columns:
flag = False
if flag:
if len(self.data) == len(self.table_columns):
col = ["'" + e + "'" if e[0] != "'" and e[-1] !="'" else e for e in self.data]
data = [self.data[e] for e in self.data]
sql = "INSERT INTO %s (%s) VALUES (%s)"
val = ('"' + self.table_name + '"', ", ".join(col), ", ".join(data))
try:
self.cursor.execute(sql, val)
self.save()
except Exception as e:
print("Could not insert into " + self.table_name)
print(e)
else:
print("Found unexpected data. Try an insert or update query.")
def save(self):
committed = False
try:
self.db.commit()
committed = True
except Exception as e:
self.db.rollback()
print(e)
if committed:
self.db.close()
self.db = None
self.cursor = None
Usage:
import os
import Model as m
os.environ["mysqlhost"] = "host"
os.environ["mysqluser"] = "mysqluser"
os.environ["mysqlpassword"] = "password"
os.environ["database"] = "database"
# Presumably, this is exactly what your API returns.
data = {
"authorId": "118985833",
"url": "semanticscholar.org/author/118985833",
"papers": [
{
"paperId": "0b698cd2269d9b52c75dbdf1994dbc9a98fb16c8",
"title": "The development of self-care workshop to prevent vicarious traumatization in training therapists working with sexually exploited children"
}
]
}
data_for_author = {"authorId": data["authorId"], "url": data["url"]}
# table_name=None
# table_columns={}
# data=None
model = m.Model(table_name="authors", data=data_for_author)
model.insert()
This may not even work - but I did it to give you some hope. There are easier ways to insert data into your database. You need to look into ORMs - the django (a webserver library for python) library has one native to it, that makes it super easy to manage your database.