0

I have Panda dataframe that contains total [1127618 rows x 64 columns] and tried to save into mysql with given command below.

engine = create_engine('mysql+mysqlconnector://user:password@127.0.0.1/joohun_test', echo=False)
df.to_sql(name='tst_dr3_201801', con=engine, if_exists = 'replace', index=False 

when it execute this command, it takes forever and seems like never finished task So if I reduced the dataframe down to [10000 rows × 64 columns], it can finally be saved into the Mysql database. But I run into different problem in final form of data saved in MySQL. As you see the columns,"smaster_uuttype" and 'user', there's white space added between characters.

MariaDB [joohun_test]> select serialno, uuttype, smaster_uuttype,failingtestname,cpptimeid, user, year, month from tst_dr3_sample limit 10;
+-------------+--------------------+--------------------------------------+-----------------+-----------+------------------+------+-------+
| serialno    | uuttype            | smaster_uuttype                      | failingtestname | cpptimeid | user             | year | month |
+-------------+--------------------+--------------------------------------+-----------------+-----------+------------------+------+-------+
| ABCDEFGH | ABCD-ABC-2500ABCD= | D E F G - H I J - 2 5 0 0 A B C D =  |                 | NULL      | d u n g l e      | 2018 |     1 |

however, looking at the same row of panda dataframe, there's no space between characters.

    serialno    uuttype  smaster_uuttype    failingtestname cpptimeid   user    year    month
0   ABCDEFGH    ABCD-ABC-2500ABCD=  DEFG-HIJ-2500ABCD=      None    dungle  2018    1

there are two things I'd like to know.

  1. I'd like to know if there's way to save data into mysql with the way string is saved in panda without space between characters.
  2. Is there a way to save large size dataframe into mysql instead of disecting dataframe into small frames?
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Joohun Lee
  • 187
  • 2
  • 14
  • it seems exist encoding issue which caused extra spaces. – Sphinx Feb 17 '18 at 00:32
  • For question 2, if your dataframe has billions bytes, i don't think it can be done quickly. the parameter=chunksize of to_sql may help when you meet timeout issue, but should not be helpful for the efficient. – Sphinx Feb 17 '18 at 00:39
  • thanks for info. chunksize would help as I'm not looking for efficient solution at this moment. btw, extra space issue is still a problem. I originally stored data into sqlite3. I checked original data from sqlite database and it only shows first characters from certain columns. when loading this data into panda dataframe, now it show up as normal string without any white space between characters. but this problem shows up again when it write back to mysql. I tried to remove white space using "update xxx replace(xxx) " method but it doesn't help. – Joohun Lee Feb 19 '18 at 19:56
  • Have you check the charset? you have to [insert the data to mysql with correct charset](https://stackoverflow.com/questions/8365660/python-mysql-unicode-and-encoding) which is same as sqlite. – Sphinx Feb 20 '18 at 06:16
  • Please provide `SHOW CREATE TABLE`. And `SELECT HEX(smaster_uuttype) ...` for the column with the spurious spaces. – Rick James Mar 01 '18 at 20:17
  • thanks for all comment. I fixed this issue. I found that there were unwanted characters added in certain columns. I had to remove them from those character in order to avoid spaces between character in new database. – Joohun Lee Mar 03 '18 at 07:54

1 Answers1

0

Using df.to_sql() can have issues. You should use native way of doing it. Here is the example https://dev.mysql.com/doc/connector-python/en/connector-python-example-cursor-transaction.html

sushmit
  • 4,369
  • 2
  • 35
  • 38