0

I need to hash certain columns (like email) while copying MySQL tables to HDFS using Sqoop.

  • Is there a built-in option in sqoop?
  • If not, how can this be achieved?

EDIT-1

Currently I could think of a very crude way to achieve this: passing a SQL query (instead of table-name) like following to sqoop

SELECT
  `name`,
  SHA1(`email`) AS `email`,
  `dob`
FROM
  `my_db`.`users`
  • Not sure if this would work at all [will update once I've tried]
  • Even if it works, it (most probably) would require generating SQL-query specific to underlying DB (MySQL, PostgreSQL etc)
y2k-shubham
  • 10,183
  • 11
  • 55
  • 131

1 Answers1

0

Is there a built-in option in sqoop?

No


If not, how can this be achieved?

  • Approach-1: use SQL-query as already described in question
  • Approach-2: another straight-forward way would be to perform a 2-step import
    • do a sqoop import into a Hive temp-table
    • create a new Hive table from this temp table and perform hashing in the process (a good approach would be Hive CTAS)
y2k-shubham
  • 10,183
  • 11
  • 55
  • 131