4

According to sqoop.apache.org, Sqoop 2 is not feature complete and should not be used for production systems. Fair enough, some people may want to test out Sqoop 2's new features on their test environments.

Cloudera has a feature comparison between Sqoop 1 and Sqoop 2 (https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_ig_sqoop_vs_sqoop2.html), but according to the page there is nothing that Sqoop 2 provides that Sqoop 1 does not also provide.

So why would anyone use Sqoop 2 in its current form? Does it provide any advantages over Sqoop 1? If not, why is it available for use? Thanks in advance!

Andrew C.
  • 410
  • 3
  • 10

3 Answers3

10

Just as a quick note :

According to Cloudera (as of Nov 2017)

Note: Sqoop 2 is being deprecated. Cloudera recommends using Sqoop 1.

Mehdi LAMRANI
  • 11,289
  • 14
  • 88
  • 130
4

Apache Sqoop uses a client model where the user needs to the install Sqoop along with connectors/drivers on the client. Sqoop2 uses a service based model, where the connectors/drivers are installed on the Sqoop2 server. Also, all the configurations needs to be done on the Sqoop2 server.

From an MR perspective another difference is that Sqoop submits a Map only job, while Sqoop2 submits a MapReduce job where the Mappers would be transporting the data from the source, while the Reducers would be transforming the data according to the source specified. This provides a clean abstraction. In Sqoop, both the transportation and the transformations were provided by Mappers only.

Another major difference in Sqoop2 is from a security perspective. The administrator would be setting up the connections to the source and the targets, while the operator user uses the already established connections, so the operator user need not know the details about the connections. And operators will be given access to only some of the connectors as required.

Aditya Agarwal
  • 693
  • 1
  • 10
  • 17
  • Thanks for the answer which includes the Map and MapReduce difference. That is a good point that the abstraction is cleaner. – Andrew C. Dec 30 '16 at 20:53
4

Some of the features expected in the Sqoop2 stable release:

  1. An easy to use GUI which would be additional to the existing command line.
  2. Security fixes like openly shared passwords to be fixed
  3. Easier debugging with better logging.
  4. Providing support to connectors which don't follow JDBC model.

Currently there are no stable releases of sqoop 2 available. But you may build the latest project to test the product and commit to the open project (if interested).


Refer:

Sqoop2 proposal

Features and releases

Ani Menon
  • 27,209
  • 16
  • 105
  • 126
  • Thanks for the answer and the sources, I've accepted your answer as the closest to what I'm looking for because of the list of features. However, I guess those are eventual features -- could you elaborate as to which features exist on Sqoop 2 currently? Thanks! – Andrew C. Dec 30 '16 at 20:55
  • 1
    Hbase connector(KiteConnector) support, Update in execution engine(MR) and Kerberos support. We will know the complete list only once a stable release it out. [Sqoop Roadmap](https://cwiki.apache.org/confluence/display/SQOOP/Sqoop+2+Roadmap) – Ani Menon Dec 31 '16 at 05:45