0

i have the following notebook, where im trying to insert the data of a dataframe into my phpmyadmin sql database to replicate run the following:

first i create the Database with the schema

CREATE SCHEMA IF NOT EXISTS `proyecto` DEFAULT CHARACTER SET utf8 ;
USE `proyecto`;
CREATE TABLE IF NOT EXISTS `pueblos`(
    `Pueblo` VARCHAR(60) NOT NULL,
    `Comunidad` VARCHAR(60) NOT NULL,
    `Provincia` VARCHAR(60) NOT NULL,
    `Latitud` float NOT NULL,
    `Longitud` float NOT NULL,
    `Altitud` float NOT NULL,
    `Habitantes` int NOT NULL,
    `Hombres` int NOT NULL,
    `Mujeres` int NOT NULL,
  PRIMARY KEY (`Pueblo`))
ENGINE = InnoDB;

and in python i import the libraries

import numpy as np
import pandas as pd
pip install mysqlclient
import MySQLdb

then i gather the data and transform it like so

df= pd.read_excel('https://www.businessintelligence.info/resources/assets/listado-longitud-latitud-municipios-espana.xls')
df=df.drop(index = 0)
new_header = df.iloc[0] 
df= df[1:] 
df.columns = new_header 

So far we have the data and the DB Schema, and so far so good. now, i try to insert some data to make sure the connection works so i run

db=MySQLdb.connect("localhost","root","","proyecto")
insertrec=db.cursor()
a="se"
b="ha"
c="insertado"
sqlquery="INSERT INTO Pueblos (Pueblo, Comunidad,Provincia,Latitud,Longitud,Altitud,Habitantes,Hombres,Mujeres) VALUES('"+a+"', '"+b+"','"+c+"',7,8,9,10,11,12)"

insertrec.execute(sqlquery)
db.commit()
print("Success!!")
db.close()

and i can see that im able to insert data into my database, great! so the issue comes when i try to now replicate the same and insert the data of my dataframe like this

for index, row in df.iterrows():
Pueblo=row['Población']
Comunidad=row['Comunidad']
Provincia=row['Provincia']
Latitud=row['Latitud']
Longitud=row['Longitud']
Altitud=row['Altitud']
Habitantes=row['Habitantes']
Hombres=row['Hombres']
Mujeres=row['Mujeres']

sqlquery="INSERT INTO Pueblos (Pueblo, Comunidad,Provincia,Latitud,Longitud,Altitud,Habitantes,Hombres,Mujeres) VALUES(row['Población'], row['Comunidad'],row['Provincia'], row['Latitud'],row['Longitud'],row['Altitud'],row['Habitantes'],row['Hombres'],row['Mujeres'])"
insertrec.execute(sqlquery)
db.commit() 

db.close()


this operation fails. 
What am i doing wrong, i believe im simply doing the same as the simple insertion but i cant understand why it doesnt work

EDIT currently attempting to implement @buran suggestion, to use df.to_sql, but it still fails the code attempted is

df.to_sql("pueblos",db,if_exists='append',index=False)

EDIT 2 the thread questions about pandas.to_sql points that df.to_sql is no longer supported, so we are currently creating an engine and attempting it through their solution. the first change was to add the column index with type int into the db schema, since df.to_sql also takes the index i also made a an user ana with password ana with same priviledges as root for the engine syntax from there attempting to implement their solution like so

from sqlalchemy import create_engine
engine = create_engine("mysql://ana:ana@localhost/proyecto")
con = engine.connect()
df.to_sql(name='pueblos',con=con,if_exists='append')
con.close()

currently this yields the error: OperationalError: (MySQLdb._exceptions.OperationalError) (1054,NULL)

Ana Uribe
  • 3
  • 3
  • Instead of iterating over rows, use [`DataFrame.to_sql`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html). And in any case - use parametrized query, don't create the INSERT statement using string methods – buran Mar 22 '21 at 20:05
  • The particular problem in your query is it is the missing `VALUES` - `INSERT INTO tablename VALUES...` (assuming you add values for all columns in the table). – buran Mar 22 '21 at 20:08
  • @buran , thanks for the idea, i tried Dataframe.to_sql but it also fails with the error code DatabaseError: Execution failed on sql: SELECT name FROM sqlite_master WHERE type='table' AND name=?; (2006, '') unable to rollback, do you have any idea how to fix tis, or what does this indicate that im doing wrong – Ana Uribe Mar 22 '21 at 21:08
  • It's hard to tell but it looks there is problem with the table name you use. Can you edit your question to include the current code you run – buran Mar 22 '21 at 21:19
  • @buran added the edit, could you elaborate on what do you mean by the table name possibly being the issue – Ana Uribe Mar 22 '21 at 21:28
  • My guess was based on the fact that this `SELECT name FROM sqlite_master WHERE type='table' AND name=?; (2006, '') ` looks like system query selecting some table. and the param value for table name is 2006... And the problem there is second value in the tuple. Also in your code you sometimes use `pueblos` and some time `Pueblos`. So too many things related to table name doesn't look right. – buran Mar 22 '21 at 21:47
  • look at https://stackoverflow.com/questions/43136121/questions-about-pandas-to-sql You need to define an engine. You pass MySqlDB connection, but that works only for sqlite3 now. So it was more my fault to suggest it with MySQL – buran Mar 22 '21 at 21:54
  • @buran added an edit with my attempt on defining an engine – Ana Uribe Mar 22 '21 at 22:59

1 Answers1

0

To solve the insertion we had to create an engine like this

from sqlalchemy import create_engine
engine = create_engine("mysql://user:password@localhost/database_name")
con = engine.connect()
df.to_sql(name='table you are inserting into',con=con,if_exists='append')
con.close()

make sure that the table name is not an existing table, the above method does not work if a schema has already created the table, so this creates a new table and creates its own schema based on the Dataframe

Full credit to @Buran for his suggestions that lead to the answer

Ana Uribe
  • 3
  • 3