0

I want to insert data from a large CSV file to MonetDB. I can't use MonetDB "mclient" because this procedure must run inside a Pentaho Server application within a Docker container. MonetDB is inside a Docker container too.

Here's my very simple transformation:

enter image description here

When I test the transformation, I always get the following error message:

2021/03/20 22:37:37 - MonetDB bulk loader.0 - Error loading data: 42000!COPY INTO: record separator contains '\r\n' but in the input stream, '\r\n' is being normalized into '\n'

Does anyone have any idea what is happening?

Thank you!

Llorieb
  • 25
  • 5

1 Answers1

0

This is related to line endings. Pentaho issues a COPY INTO statement,

COPY INTO <table> FROM <file>
USING DELIMITERS ',', '\r\n'

Here, \r\n means DOS/Windows line endings. Since the Oct2020 release, MonetDB always normalizes the line endings from DOS/Windows to Unix style \n when loading data. Before, it used to sometimes normalize and sometimes not. However, normalizing to \n means that looking for \r\n would yield one giant line containing the whole file, hence the error message.

I will submit a patch to MonetDB to automatically replace the USING '\r\n' with '\n'. This will fix it in the longer term.

In the short term I have no good solution to offer. I have no experience with Pentaho, but looking at the source code, it seems Pentaho is uses the system property lines.separator, which is \r\n on Windows.

This means, if you have access to a Mac or Linux machine to run Pentaho on, that will work as line.separator is \n there. Otherwise, maybe you can ask the Pentaho people if the JVM can be started with something like java -Dline.separator="\n" as a workaround, see also this Stack Overflow question.

Otherwise, we'll have to use a patched version of either Pentaho, the MonetDB JDBC driver or MonetDB. I could send you a patched version of the JDBC driver that automagically replaces the '\r\n' with '\n' before sending the query to the server but you have to figure out yourself how to get Pentaho to use this JDBC driver instead of the default one.

  • Thank you for the response. I tried to use java -Dline.separator="\n" in Pentaho's JVM startup, but it doesn't work. Pentaho doesn't even started up. I think the best way to solve this is in the long term, sending this patch to MonetDB. If it is not a problem for you, I'd appreciate your effort. Meanwhile, I think I can figure out a workaround. If I do a bulk insert from Pentaho in Linux, is it going to work? I think file must be in Linux style: lines ended with "\n". – Llorieb Mar 27 '21 at 14:11
  • I expect but cannot promise we'll soon release an update to the Oct2020 branch in which this issue is fixed. If you do it on Linux you can use either DOS or Unix line endings, it doesn't matter which. – Joeri van Ruth Mar 29 '21 at 07:48
  • Version Oct2020-SP4 has been released which should fix this issue – Joeri van Ruth Apr 14 '21 at 13:54
  • Excellent news! I really appreciate you support and prompt response. Thank you very much! I'll let you know soon the results. – Llorieb Apr 16 '21 at 15:27