Generating splits for a textual index column allowed only in case of "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" property passed as a param

Question

I've command to import sql from sqlserver to hive as below

sqoop import --connect 'jdbc:sqlserver://10.0.2.11:1433;database=SP2010' --username pbddms -P --table daily_language --hive-import --hive-database test_hive --hive-table daily_language --hive-overwrite --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N'

But it result

19/02/22 09:10:24 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.5.0-292
19/02/22 09:10:24 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
19/02/22 09:10:24 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
19/02/22 09:10:24 INFO manager.SqlManager: Using default fetchSize of 1000
19/02/22 09:10:24 INFO tool.CodeGenTool: Beginning code generation
19/02/22 09:10:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [daily_language] AS t WHERE 1=0
19/02/22 09:10:25 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.6.5.0-292/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/ddab816638bd5e65108647177ab703b0/daily_language.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/02/22 09:10:27 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/ddab816638bd5e65108647177ab703b0/daily_language.jar
19/02/22 09:10:27 INFO mapreduce.ImportJobBase: Beginning import of daily_language
19/02/22 09:10:29 INFO client.RMProxy: Connecting to ResourceManager at mghdop01.dcdms/10.0.37.157:8050
19/02/22 09:10:29 INFO client.AHSProxy: Connecting to Application History server at mghdop01.dcdms/10.0.37.157:10200
19/02/22 09:10:31 INFO db.DBInputFormat: Using read commited transaction isolation
19/02/22 09:10:31 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN([kdbahasa]), MAX([kdbahasa]) FROM [daily_language]
19/02/22 09:10:31 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/root/.staging/job_1547085556146_0680
19/02/22 09:10:31 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Generating splits for a textual index column allowed only in case of "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" property passed as a parameter
        at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:204)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
        at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:200)
        at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:173)
        at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:270)
        at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
        at org.apache.sqoop.manager.SQLServerManager.importTable(SQLServerManager.java:163)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:507)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
Caused by: Generating splits for a textual index column allowed only in case of "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" property passed as a parameter
        at org.apache.sqoop.mapreduce.db.TextSplitter.split(TextSplitter.java:67)
        at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:201)
        ... 23 more

Why there is

ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Generating splits for a textual index column allowed only in case of "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" property passed as a parameter

Although I don't give split-by in the sqoop import above. First, how I can solve the case above?

Then I try to add "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" in sqoop import above, but it give me another error below;

19/02/22 09:20:43 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.5.0-292
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: Dorg.apache.sqoop.splitter.allow_text_splitter=true
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --username
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: pbddms
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: -P
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --table
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: daily_language
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-import
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-database
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: test_hive
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-table
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: daily_language
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-overwrite
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-drop-import-delims
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --null-string
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: \\N
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --null-non-string
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: \\N\

Second case, how I can solve the case above?

score 2 · Answer 1 · answered Jul 05 '21 at 15:51

2

In my case I had to put this part in a double quotes:

sqoop import "-Dorg.apache.sqoop.splitter.allow_text_splitter=true"

answered Jul 05 '21 at 15:51

andykrk

21
2

score 0 · Answer 2 · answered Feb 22 '19 at 07:47

0

I takes kdbahasa column as a split-column. Add -m 1 parameter to specify the number of mappers. 1 - means it will run on single mappers without splits:

sqoop import --connect 'jdbc:sqlserver://10.0.2.11:1433;database=SP2010' --username pbddms -P --table daily_language --hive-import -m 1 --hive-database test_hive --hive-table daily_language --hive-overwrite --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N'

Also read this about split_column if you want to split: https://stackoverflow.com/a/37389134/2700344

answered Feb 22 '19 at 07:47

leftjoin

36,950
8
57
116

when use -m 1, should it followed by split-column? what about increase num map task (i.e. 2 or more), because from http://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html#_free_form_query_imports , it just said define the m by 1 – m hanif f Feb 22 '19 at 10:17
sorry I've delete the comment before (do you get the column?) – m hanif f Feb 22 '19 at 10:17
No, with -m 1 you do not need to specify split column. It will be no splits, will query all the dataset on single mapper – leftjoin Feb 22 '19 at 10:18
but if I don't use -m 1, I must use split-by and define the --columns? – m hanif f Feb 22 '19 at 11:05
Yes. if you want to fetch in parallel, then specify -m and --split-by Without -m and no --split-by specified sqoop tries to pick split-column automatically and determine the number of splits – leftjoin Feb 22 '19 at 11:08
In my case, I feel safe to import tables with VARCHAR key columns sequentially. Also I never use `--split-by`, i just give `--num-mappers` and let `sqoop` figure out the rest. Until `sqoop 1.4.6`, this strategy worked well with `MySQL` since by using `--autoreset-to-one-mapper` flag, sqoop would fall-back to single-part import for tables with string key columns. But `sqoop 1.4.7` onwards, in such cases the job fails with this very exception. This is a nuisance and I'm not aware of any workarounds (as a result I have to stick with `sqoop 1.4.6`) – y2k-shubham Jun 26 '19 at 20:07

score 0 · Answer 3 · answered Jun 29 '21 at 19:10

Faced similar issue found this is how we need to pass that sqoop property:

sqoop import -D org.apache.sqoop.splitter.allow_text_splitter=true --connect 'jdbc:sqlserver://10.0.2.11:1433;database=SP2010' --username pbddms -P --table daily_language --hive-import -m 1 --hive-database test_hive --hive-table daily_language --hive-overwrite --hive-drop-import-delims --null-string '\N' --null-non-string '\N'

Divit Sharma · Answer 4 · 2023-08-01T16:56:40.797

0

You need to specify -D org.apache.sqoop.splitter.allow_text_splitter=true

sqoop import \
-D org.apache.sqoop.splitter.allow_text_splitter=true \
--connect jdbc:mysql://localhost/dbname \
--target-dir /home/salaries2 \
-m 2 \
--split-by gender \
--query "SELECT id, salary, gender FROM salaries WHERE salary < 90000 AND \$CONDITIONS"

edited Aug 01 '23 at 16:56

answered Aug 01 '23 at 16:55

Divit Sharma

1
2

Generating splits for a textual index column allowed only in case of "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" property passed as a param

4 Answers4