0

I see there were a few posts: here and here in the past on stackoverflow, however no feasible solution was provided.

In my situation my table has billion rows, it doesn't come with an integer column as its key, that means if I use sqoop to do the import (into hive), I would not be able to use multiple mapper.

As table's size is big, it is not realistic to add an extra new integer field to it.

Any thought are appreciated. Thank you in advance.

Choix
  • 555
  • 1
  • 12
  • 28

1 Answers1

0

Split, by default look for the integer column. If you want to perform splitting using string column, you need to enable property: -Dorg.apache.sqoop.splitter.allow_text_splitter=true in your Sqoop command and define a good string column in --split-by clause and then use -m for defining the number of mappers.

Sandeep Singh
  • 7,790
  • 4
  • 43
  • 68
  • Thank you. A discussion can be seen [here](: https://community.hortonworks.com/questions/26961/sqoop-split-by-on-a-string-varchar-column.html?childToView=203173#answer-203173), as reported by someone else: it will cause duplicates in the results set. – Choix Jul 11 '18 at 15:38
  • I am not sure. if this is a bug duplicate may come even with the small table as well. You can give a try to check it. – Sandeep Singh Jul 12 '18 at 05:57