3

Please find my below problem and help me to resolve it.

I have a sqoop statement which brings data from Mainframe to Hadoop(Hive) using attunity jdbc driver.

    sqoop import \
    --connect "jdbc:attconnect://X.X.1.1:8883;DefTdpName=PROD_E;OneTdpMode=1"\ 
    --driver "com.attunity.jdbc.NvDriver" \
   --query Select * FROM how_record WHERE \$CONDITIONS" \
    --fields-terminated-by "\t" \
    --hive-import \
    --hive-overwrite \
    --delete-target-dir  \
    --null-string '\\N' \
    --null-non-string '\\N' \
    --hive-delims-replacement '' \
    --target-dir "path/how_record" \
    --fetch-size 50000 \
    --hive-table "how_record" -m 1

When i am running above sqoop statement for first time i am keep getting socket time out error.

ERROR manager.SqlManager: Error executing statement:
java.sql.SQLException: JBC0088E: JBC0002E: Socket timeout detected: Read timed out

java.sql.SQLException: JBC0088E: JBC0002E: Socket timeout detected: Read timed out

            at com.attunity.jdbc.NvSQLWarning.createSQLException(Unknown Source)

            at com.attunity.comm.XmsgComm.xMsgThrowTimeoutEx(Unknown Source)

            at com.attunity.comm.XmsgComm.treatInterruptedIOException(Unknown Source)

            at com.attunity.comm.XmsgComm.xMsgPerformCall(Unknown Source)

            at com.attunity.comm.Daemon.GetServer(Unknown Source)

            at com.attunity.navapi.SysTdp.getRemoteSysTdp(Unknown Source)

            at com.attunity.jdbc.NvConnectionBase.open(Unknown Source)

            at com.attunity.jdbc.NvConnectPropertiesBase.openConnection(Unknown Source)

            at com.attunity.jdbc.NvDriverCoreBase.connect(Unknown Source)

            at com.attunity.jdbc.NvDriverCoreBase.connect(Unknown Source)

            at java.sql.DriverManager.getConnection(DriverManager.java:664)

            at java.sql.DriverManager.getConnection(DriverManager.java:270)

But when i ran above one for second time(Immediately), The error is automatically resolved and sqoop is completing successfully.

I am facing this issue only when i ran sqoop for first time.

I went through few blogs and error may be related to connection pool in the JDBC.

My background is mostly related DATA, Don't have much experience at JAVA side.

Can some one please help me to find out the actual issue and resolution.

Thanks in advance!!

Tharunkumar Reddy
  • 2,773
  • 18
  • 32

2 Answers2

2

Refer 24.1.Sqoop Supported Databases

Refer Attunity Driver Configuration

Try specifying the appropriate driver using --driver option in sqoop import command.

Download nvjdbc2.jar and place it in an appropriate location as specified in sqoop documentation as per your cluster.

Class name to be used : com.attunity.jdbc.NvDriver

While JDBC is a compatibility layer that allows a program to access many different databases through a common API, slight differences in the SQL language spoken by each database may mean that Sqoop can’t use every database out of the box, or that some databases may be used in an inefficient manner.

Even if Sqoop supports a database internally, you may still need to install the database vendor’s JDBC driver in your $SQOOP_HOME/lib path on your client. Sqoop can load classes from any jars in $SQOOP_HOME/lib on the client and will use them as part of any MapReduce jobs it runs; unlike older versions, you no longer need to install JDBC jars in the Hadoop library path on your servers.

yammanuruarun
  • 403
  • 3
  • 9
  • Thanks for the response. Do you recommend to install third party JDBC driver in all data nodes of our cluster? – Tharunkumar Reddy Dec 30 '19 at 06:52
  • As per sqoop documentation, you need to install the database vendor’s JDBC driver in your $SQOOP_HOME/lib path on your client most probably it will be your edge node(gateway) . In fact, it depends on what your requirement is and from which machine you will run the sqoop import command. So if you installed sqoop on all the data nodes and trigger sqoop import from any of the data node then you need to place jar in that data node under sqoop lib location. – yammanuruarun Dec 30 '19 at 10:37
  • Jdbc jar was placed in $SQOOP_HOME/lib path long time ego in edge node, And we are facing this issues from last one month. That is also for first time it is failing. I suspect some thing wrong in JDBC/source serverside to get this issue. What ever you mentioned in your answer, we are following it. Any other recommendations? – Tharunkumar Reddy Dec 30 '19 at 11:37
  • Below links are not exactly related to sqoop connection but they are related to socket read timeout issues and should help you debug and analyze more. https://stackoverflow.com/questions/3069382/what-is-the-difference-between-connection-and-read-timeout-for-sockets https://stackoverflow.com/questions/17079273/java-net-sockettimeoutexception-read-timed-out-under-tomcat https://www.ibm.com/support/pages/sockettimeoutexception-read-timed-out-seen-during-syncnode-request – yammanuruarun Dec 31 '19 at 12:53
0

Have you looked at the load on your database server? I am guessing that is where the bottleneck lies. This configuration is useful only of you can scale your database server, or have multiple servers, each with a different set of tables.

  • I don't think it is due to load issue. That is the very first sqoop query which runs on the database(Mainframe) server as part of our production jobs. At that time the load is very very minimal and everyday sqoop brings less than 1.5k records. So Can you suggest any other things to check. – Tharunkumar Reddy Dec 26 '19 at 18:13