3

One of the advantages of migrating to Sqoop2 is that we are not sharing database credentials with clients.

Now when we execute Sqoop commands, they look like below:

sqoop import --connect ... --username ... --table ...

When we upgrade to Sqoop2 then also we are executing same command except that connection string will point to Sqoop2 server rather than actual RDBMS (involved in transfer) and credentials will be of Sqoop2 server.

Here also we are sharing credentials of Sqoop2 server with all the clients. Is it not violating basic principle for which we created Sqoop2?

Bhavuk Chawla
  • 212
  • 1
  • 10

1 Answers1

0
  • In Sqoop, who have access to the hadoop cluster will know the database credentials as it has to be hard coded
  • In Sqoop2, database credentials will be known to only the admins who manage the cluster. Developers need not know the password.
  • In Sqoop client can submit jobs directly on the cluster, there is no server concept. It means that you need to have JDBC jar files on the Sqoop client. Once you have database credentials and the jar files with in the same firewall, security can be easily breached outside Sqoop.
  • In Sqoop2 client will not submit jobs directly, it will point to the server and server will submit the jobs. So Sqoop server, database and hadoop cluster can be behind the firewall and only Sqoop server ports shall be opened to only Sqoop2 client. Hence users cannot breach security by logging into database outside the Sqoop (even if they know database credentials and have jdbc jars).

On top of additional security, it also have this major difference:

  • Sqoop cannot be integrated with web interfaces such as hue as it follows client only architecture
  • Sqoop2 runs on client server architecture. Server runs as web applications and hence tools like Hue can actually used to develop sqoop based scripts
Durga Viswanath Gadiraju
  • 3,896
  • 2
  • 14
  • 21