0

My web host has upgraded its servers. The newer 5.7.27 version of MySQL that they installed has LOAD DATA LOCAL INFILE disabled by default, resulting in Error 1148 when I try to execute the command. Unfortunately I can't start or stop the MySQL instance as that is under the control of the web host. What are some workarounds or alternate methods that will allow me to import data with the least effort? All the data I want to import are currently in TSV (tab separated value) format, but I could switch to CSV or something else if required. I have Workbench installed as well if it helps.

The problem is basically the same as this one, except I cannot access and reconfigure the server (the selected answer to that question).

posfan12
  • 2,541
  • 8
  • 35
  • 57
  • 3
    Why don't you _talk_ to your host and ask them to change the setting? You know, have a _conversation_? – Lightness Races in Orbit Dec 28 '19 at 23:23
  • I'll be sure to do that. Last time I asked them to enable stored procedures they declined to do so, calling it a security risk. `LOAD DATA LOCAL INFILE` is also a security risk which is why it is disabled by default. – posfan12 Dec 28 '19 at 23:26
  • 1
    They were happy enough to have it enabled before. Now, for the same money, it is disabled. You are the customer. Have a conversation with them! They may not even realise the default changed in the new version. – Lightness Races in Orbit Dec 28 '19 at 23:26
  • This is off-topic. – posfan12 Dec 28 '19 at 23:27
  • 2
    I'm voting to close this question as off-topic because it’s not about programming or code, it’s about your relationship with your web hosting company. If the software feature is now disabled, there is no work around - you must communicate with your host and get it sorted out. Your request for loading code is far too broad to be on-topic. You can use google to help you there. – Bohemian Dec 28 '19 at 23:30
  • Right, a workaround (should it be needed) likely will require actual SQL code. And plugging security holes has nothing to do with "relationship with your web hosting company". – posfan12 Dec 28 '19 at 23:33
  • Quote: "They were happy enough to have it enabled before." It was never *enabled* before. It was simply never *disabled* until now. – posfan12 Dec 28 '19 at 23:40
  • 1
    Not disabled means enabled. – Lightness Races in Orbit Dec 28 '19 at 23:46
  • 2
    @LightnessRacesBY-SA3.0 if it’s not off topic, it is definitely too broad. It’s either a “wave a magic wand” question, or it’s a “here’s my spec, do my work for me” question. There are hundreds if not thousands of sites that show simple code to load data. If OP has problems with OP’s code attempt at loading, then OP can ask about that. – Bohemian Dec 29 '19 at 00:00
  • How is it more off-topic than this? https://stackoverflow.com/questions/18437689/error-1148-the-used-command-is-not-allowed-with-this-mysql-version The top answers describe command-line switches and config files that have little to do with code. (Unless you consider INI file syntax to be "code".) – posfan12 Dec 29 '19 at 00:06
  • @Bohemian That particular closure reason would be easier to argue. Still, this is quite answerable, as I've shown. – Lightness Races in Orbit Dec 29 '19 at 00:25

1 Answers1

1

I'd write a Python script to take your TSV input, and use it to generate INSERT statements in a loop. Each statement would handle perhaps 100-200* new rows. Then it would execute those statements.

Run it on the same server. Do it in a transaction so you don't make a mess on your first few tries if there are errors.

There you have it: TSV import.

* Or, well, whatever you want. Doing them one at a time will be slow (because there is a small overhead associated with the execution of each SQL statement), but you probably can't just dump them all into a single INSERT unless the amount of information is small. Check your server settings/limits, and come up with a reasonable batch size for your use case. For <2000 rows, and reasonably "short" row data, 100-200 rows per statement would usually be appropriate.


Pseudo-code:

batchSize = 100
buffer = []

handleInput():
   for each line in tsvFile:
       data = parse(line)
       add data to buffer
       if size(buffer) > batchSize:
          flushBuffer
   if size(buffer) > batchSize:
      flushBuffer

flushBuffer:
    str = "INSERT INTO tbl (col1, col2, col3) VALUES"
    for each row in buffer:
       if !str.empty():
          str += ","
       str += "(" + row[col1] + ", " + row[col2] + ", " + row[col3];
    executeSqlStatement(str)
    buffer = []
Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • The limit of 100-200 rows is somewhat inconvenient. Can Workbench provide some leverage? (I am not familiar with all its features.) – posfan12 Dec 28 '19 at 23:38
  • 2
    I picked that number fairly arbitrarily. The number you'll want depends on the volume of data, the amount of information in each row, the limits/settings of your server.... Why is it inconvenient? – Lightness Races in Orbit Dec 28 '19 at 23:39
  • It is an additional variable I may forget about later, resulting in a huge headache. I routinely add data in 100 row x 20 column batches already. Most columns are integers or NULL, but there are VARCHAR and TEXT columns (for graphical data) as well. – posfan12 Dec 29 '19 at 00:55
  • What sort of "huge headache" do you anticipate? If there's an error related to too-long queries, change the batch size, run it again. Takes literally seconds. – Lightness Races in Orbit Dec 29 '19 at 01:00
  • You can always generate the batch size dynamically based on some formula based on the input size. It just takes a few minutes' thought when writing your code!! – Lightness Races in Orbit Dec 29 '19 at 01:01
  • Mostly I am worried about accidentally omitting rows. If there are 3 batches but I forget the 3rd batch for instance. – posfan12 Dec 29 '19 at 01:05
  • 1
    @posfan12 I don't understand. How would your script "forget" a batch? – Lightness Races in Orbit Dec 29 '19 at 01:05
  • Ah, okay. I thought you were writing the generated code to a SQL file, not executing it. – posfan12 Dec 29 '19 at 01:08
  • @posfan12 Well, you could do that too. Same effect. Either you execute the resulting SQL statements now or you execute them later. Whatevs. Still no way to "forget" anything. – Lightness Races in Orbit Dec 29 '19 at 01:09
  • I've learned that if there's a way to make a mistake, I can guarantee I'll make it at some point. – posfan12 Dec 29 '19 at 01:09
  • 1
    You'd have to try _really_ hard to make a program like the one I've shown above "forget" rows/batches. Like, you'd actually have to go out of your way to write code to make it do that. This is pretty basic scripting; I don't say that to offend you, but to point out how simple and commonplace this task is for your computer. – Lightness Races in Orbit Dec 29 '19 at 01:10
  • For one table it's not a big deal. Multiple tables with different data types in each column starts to get annoying. And each time the schema changes I have to update the program(s). But if it's the only solution then I will have to do it. (I can likely figure out how to program it.) – posfan12 Dec 29 '19 at 04:48
  • Since TSV has no types, and no tables, your script shall simply recreate what `LOAD DATA INFILE` does. You don't have to update `LOAD DATA INFILE` to teach it about your data, so you shouldn't have to update your script, either. I concede you may have to have your script examine the schema if there are unusual string conversions that need to be performed. All depends on your data. – Lightness Races in Orbit Dec 29 '19 at 14:34
  • AFAIK when using `INSERT INTO ... VALUES` all strings must be in quotes, and if the strings contain quotes they need to be escaped. Thus I will need a separate script for each table. I did learn that Workbench has a data import command which may still work even if `LOAD DATA` is disabled. I will have to check it out and see. – posfan12 Dec 30 '19 at 07:35
  • @posfan12 _"if the strings contain quotes they need to be escaped"_ / _"Thus I will need a separate script for each table"_ I cannot work out how the first statement leads to the latter. Why do you need a separate script for each table just because you need to do string escaping when building queries? Anyway, ideally you'd use prepared statements instead. – Lightness Races in Orbit Dec 30 '19 at 17:32
  • I said "all strings must be in quotes". And since each table has a different number of string parameters, I have to build the `INSERT INTO` string differently each time. For instance, `row[col1]` might be a string and require quotes. But in a different table it might be `row[col3]` or `row[col7]`. – posfan12 Dec 30 '19 at 17:50
  • As I said before, _"I concede you may have to have your script examine the schema if there are unusual string conversions that need to be performed"_. That doesn't mean you need a whole new script for each table!! You don't have to hardcode everything! Besides, you should be able to just send everything as strings. Again, recall that `LOAD DATA INFILE` _already does all of this_. If it can do it, so can you/your script! – Lightness Races in Orbit Dec 30 '19 at 18:10
  • In your answer `str = "INSERT INTO tbl (col1, col2, col3) VALUES"` etc. etc. etc. `str += "(" + row[col1] + ", " + row[col2] + ", " + row[col3] + "),"` you are building a SQL query, and all SQL queries require strings to be quoted. It has nothing to do with the TSV input. – posfan12 Dec 30 '19 at 18:27
  • So? Add quotes then! It's pseudo-code. You're not supposed to copy-paste it. You're supposed to find a statement-building approach that suits your purposes. (And, as I said, prepared statements would be far superior to quoted strings anyway.) If you have any further questions about creating scripts to automate tasks I suggest you ask them as new questions. Good luck – Lightness Races in Orbit Dec 30 '19 at 18:47