7

I'm using Python and its MySQLdb module to import some measurement data into a Mysql database. The amount of data that we have is quite high (currently about ~250 MB of csv files and plenty of more to come).

Currently I use cursor.execute(...) to import some metadata. This isn't problematic as there are only a few entries for these.

The problem is that when I try to use cursor.executemany() to import larger quantities of the actual measurement data, MySQLdb raises a

TypeError: not all arguments converted during string formatting

My current code is

def __insert_values(self, values):
    cursor = self.connection.cursor()
    cursor.executemany("""
        insert into values (ensg, value, sampleid)
        values (%s, %s, %s)""", values)
    cursor.close()

where values is a list of tuples containing three strings each. Any ideas what could be wrong with this?

Edit:

The values are generated by

yield (prefix + row['id'], row['value'], sample_id)

and then read into a list one thousand at a time where row is and iterator coming from csv.DictReader.

Jon Seigel
  • 12,251
  • 8
  • 58
  • 92
lhahne
  • 5,909
  • 9
  • 33
  • 40
  • 1
    Have you verified that values contains the right data? As an aside, you should look at using LOAD LOCAL DATA INFILE. It can be a lot faster. – brian-brazil Jun 10 '09 at 10:36
  • The problem is that the data needs to be preprocessed (which is currently done in the same python script) and it isn't very viable to create redundant copies of such large datasets. The values should be ok as I checked that with a debugger. – lhahne Jun 10 '09 at 10:44

2 Answers2

8

In retrospective this was a really stupid but hard to spot mistake. Values is a keyword in sql so the table name values needs quotes around it.

def __insert_values(self, values):
    cursor = self.connection.cursor()
    cursor.executemany("""
        insert into `values` (ensg, value, sampleid)
        values (%s, %s, %s)""", values)
    cursor.close()
lhahne
  • 5,909
  • 9
  • 33
  • 40
3

The message you get indicates that inside the executemany() method, one of the conversions failed. Check your values list for a tuple longer than 3.

For a quick verification:

max(map(len, values))

If the result is higher than 3, locate your bad tuple with a filter:

[t for t in values if len(t) != 3]

or, if you need the index:

[(i,t) for i,t in enumerate(values) if len(t) != 3]
gimel
  • 83,368
  • 10
  • 76
  • 104