0

I try to import a table with sqoop; i use 4 mappers. The problem is there's a huge difference between execution time between the mappers. Some less than 10 mints the others is more than one hour. may you explain why? and how to optimize my import? enter image description here

Zied Hermi
  • 229
  • 1
  • 2
  • 11

2 Answers2

0

Looks like uneven distribution of the data among the mapper can be reason for this difference.

I think you can check what is the primary key of the table and what is range like min and max values of the same. Because based on the range the data will be distributed in the mappers. And check if more data is imported by the last two mappers.

0

Try to use the --split-limit parameter to optimize your import. If the size of the split created is larger than the size specified in this parameter, then the splits would be resized to fit within this limit, and the number of splits will change according to that. This affects actual number of mappers and leads to more balanced mappers.

Iskuskov Alexander
  • 4,077
  • 3
  • 23
  • 38
  • the parameter `--split-limit` have to take the value of `--num-mappers` ? – Zied Hermi May 02 '18 at 15:46
  • If size of a split calculated based on provided `--num-mappers` parameter exceeds `--split-limit` parameter then actual number of mappers will be increased. If the value specified in `--split-limit` parameter is 0 or negative, the parameter will be ignored altogether and the split size will be calculated according to the number of mappers. – Iskuskov Alexander May 02 '18 at 15:52
  • and split-limit is supported only with Integer and Date columns? – Zied Hermi May 02 '18 at 16:00
  • Yes, as mentioned in docs: `This only applies to Integer and Date columns. For date or timestamp fields it is calculated in seconds.` – Iskuskov Alexander May 02 '18 at 16:27
  • See answer https://stackoverflow.com/a/37389134/7109598. Maybe solution #3 could help you – Iskuskov Alexander May 04 '18 at 08:26