1

I have a pandas dataframe containing different datatypes, like the one below:

import pandas as pd

df = pd.DataFrame([[1, 'a', (4,5,6)],[1, 'b', (7,8,9)]])

print(df)
#    0  1          2
# 0  1  a  (4, 5, 6)
# 1  1  b  (7, 8, 9)

print(df.dtypes)
# 0     int64
# 1    object
# 2    object
# dtype: object

I want to save df to a MySql database using sqlalchemy and possibly pandas' to_sql() function.

The problem is that trying df.to_sql(table_name, conn) raises an exception because mysql connector doesn't nkow how to convert the tuple to a mysql type.

As a workaround, I turned the tuple in a string by using apply(), adding quotes around the tuples. This solution works and the tuple is written in the db as a string.

This is not ideal because when I read the table from the db I will have to convert strings containing a tuple back to tuples, which in my opinion is not the most elegant solution.

My question is: is there a better way to accomplish this?

Looking around I can't seem to find any simple solution to this problem

By the way, what I'm trying to do is to store sklearn model parametrizations in a db. The tuple problem arises when using MLPRegressor which wants a tuple for hidden_layer_sizes argument

gionni
  • 1,284
  • 1
  • 12
  • 32
  • *how to convert the tuple to a mysql type*...and what MySQL type do you intend? Usually RDBMS's are designed to store primitive types in bit form (boolean/string/numeric/date). It is not recommended to save complex application layer (i.e., python) types (i.e., tuple) in databases. Save underlying raw data, then rebuild in code as needed. Otherwise look into blob as listed [here](https://dev.mysql.com/doc/refman/5.7/en/data-types.html) but I doubt this type works with pandas. – Parfait Sep 02 '17 at 14:20
  • Yeah, I tried blob but had no luck. I guess the workaround is the actual solution then. Thanks for the help – gionni Sep 02 '17 at 14:24
  • Will tuples always be same length of three? If so save in different columns of pandas dataframe and then when imported from MySQL, bind them into tuples. – Parfait Sep 02 '17 at 14:47
  • alas, they won't. I guess I'll stick to the string conversion – gionni Sep 02 '17 at 14:50
  • Alternatively, save tuples in different rows under one *values* column with repeated indicators (long format and preferable with data storage than the expensive wide format). And then just `groupby` back into tuples. – Parfait Sep 02 '17 at 14:53
  • I might just try that, thanks a lot :-) – gionni Sep 02 '17 at 14:57

2 Answers2

2

Consider storing tuple data in long format which can be saved appropriately in MySQL. Once read back into pandas, you can then groupby(tuple) to return to needed format for application work but again not to be saved in this format. Below is demonstration of different length tuples:

import pandas as pd

data_list = [[1, 'a', (4,5,6)],[1, 'b', (7,8,9,11,14)]]

dfs = [pd.DataFrame({'ID':i[0], 'LETTER':i[1], 'VALUES':i[2]}) for i in data_list]

df = pd.concat(dfs).reset_index(True)
print(df)
#    ID LETTER  VALUES
# 0   1      a       4
# 1   1      a       5
# 2   1      a       6
# 3   1      b       7
# 4   1      b       8
# 5   1      b       9
# 6   1      b      11
# 7   1      b      14


gdf = df.groupby(['ID', 'LETTER'])['VALUES'].apply(tuple).reset_index()
print(gdf)
#    ID LETTER             VALUES
# 0   1      a          (4, 5, 6)
# 1   1      b  (7, 8, 9, 11, 14)
Parfait
  • 104,375
  • 17
  • 94
  • 125
0

I think this should work(worked for me)

from sqlalchemy import create_engine

engine = create_engine('mysql://user:passwd@127.0.0.1/db_name?charset=utf8')
df.to_sql('your_data',engine, flavor = 'mysql', if_exists= 'append')

Also check writing to MySQL with pandas using SQLAlchemy

MishaVacic
  • 1,812
  • 8
  • 25
  • 29
  • 1
    I tried, but it gives an error, furthermore I just read that flavor is [deprecated](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html) – gionni Sep 02 '17 at 14:02
  • Execute works, but that's because it sees the tuple as a string I guess. I can try with mysql connector instead of sqlalchemy, but I don't thinkit'll work either – gionni Sep 02 '17 at 14:22