4

I am working with a 2 node fully distributed hadoop cluster. I am trying to connect tasktracker to run on the slave node but it is not able to connect to my 9000/9001 ports. Below are the config files so if anyone spots something then please holler!

Error message from Tasktracker (ran using start-all on master)

2012-12-19 09:33:03,161 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2012-12-19 09:33:03,316 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2012-12-19 09:33:03,320 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2012-12-19 09:33:03,320 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: TaskTracker metrics system started
2012-12-19 09:33:03,888 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2012-12-19 09:33:04,502 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2012-12-19 09:33:04,755 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2012-12-19 09:33:04,799 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2012-12-19 09:33:04,807 INFO org.apache.hadoop.mapred.TaskTracker: Starting tasktracker with owner as hadoop
2012-12-19 09:33:04,813 INFO org.apache.hadoop.mapred.TaskTracker: Good mapred local directories are: /tmp/hadoop-hadoop/mapred/local
2012-12-19 09:33:04,826 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2012-12-19 09:33:04,856 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered.
2012-12-19 09:33:04,857 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source TaskTrackerMetrics registered.
2012-12-19 09:33:04,920 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
2012-12-19 09:33:04,923 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort38644 registered.
2012-12-19 09:33:04,926 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcActivityForPort38644 registered.
2012-12-19 09:33:04,929 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2012-12-19 09:33:04,931 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 38644: starting
2012-12-19 09:33:04,931 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 38644: starting
2012-12-19 09:33:04,932 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 38644: starting
2012-12-19 09:33:04,932 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 38644: starting
2012-12-19 09:33:04,933 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 38644: starting
2012-12-19 09:33:04,935 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: localhost/127.0.0.1:38644
2012-12-19 09:33:04,935 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker tracker_10.77.26.116:localhost/127.0.0.1:38644
2012-12-19 09:33:05,980 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 0 time(s).
2012-12-19 09:33:06,982 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 1 time(s).
2012-12-19 09:33:07,985 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 2 time(s).
2012-12-19 09:33:08,987 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 3 time(s).
2012-12-19 09:33:09,989 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 4 time(s).
2012-12-19 09:33:10,991 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 5 time(s).
2012-12-19 09:33:11,994 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 6 time(s).
2012-12-19 09:33:12,996 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 7 time(s).
2012-12-19 09:33:13,998 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 8 time(s).
2012-12-19 09:33:15,001 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 9 time(s).
2012-12-19 09:33:15,004 INFO org.apache.hadoop.ipc.RPC: Server at ipdiscovermaster.cloudapp.net/168.63.72.148:9001 not available yet, Zzzzz...
2012-12-19 09:33:17,009 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 0 time(s).
2012-12-19 09:33:18,011 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 1 time(s).
2012-12-19 09:33:19,013 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 2 time(s).
2012-12-19 09:33:20,015 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 3 time(s).
2012-12-19 09:33:21,018 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 4 time(s).
2012-12-19 09:33:22,020 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 5 time(s).
2012-12-19 09:33:23,022 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 6 time(s).
2012-12-19 09:33:24,026 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 7 time(s).
2012-12-19 09:33:25,033 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 8 time(s).
2012-12-19 09:33:26,036 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 9 time(s).
2012-12-19 09:33:26,039 INFO org.apache.hadoop.ipc.RPC: Server at ipdiscovermaster.cloudapp.net/168.63.72.148:9001 not available yet, Zzzzz...
2012-12-19 09:33:28,044 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 0 time(s).
2012-12-19 09:33:29,045 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 1 time(s).
2012-12-19 09:33:30,048 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 2 time(s).
2012-12-19 09:33:31,051 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 3 time(s).
2012-12-19 09:33:32,055 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 4 time(s).
2012-12-19 09:33:33,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 5 time(s).
2012-12-19 09:33:34,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 6 time(s).
2012-12-19 09:33:35,063 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 7 time(s).
2012-12-19 09:33:36,071 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 8 time(s).
2012-12-19 09:33:37,073 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 9 time(s).
2012-12-19 09:33:37,083 INFO org.apache.hadoop.ipc.RPC: Server at ipdiscovermaster.cloudapp.net/168.63.72.148:9001 not available yet, Zzzzz...
2012-12-19 09:33:39,086 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 0 time(s).
2012-12-19 09:33:40,094 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 1 time(s).
2012-12-19 09:33:41,097 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 2 time(s).
2012-12-19 09:33:42,101 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 3 time(s).
2012-12-19 09:33:43,104 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 4 time(s).
2012-12-19 09:33:44,107 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 5 time(s).
2012-12-19 09:33:45,113 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 6 time(s).
2012-12-19 09:33:46,118 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 7 time(s).
2012-12-19 09:33:47,122 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 8 time(s).
2012-12-19 09:33:48,131 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 9 time(s).
2012-12-19 09:33:48,134 INFO org.apache.hadoop.ipc.RPC: Server at ipdiscovermaster.cloudapp.net/168.63.72.148:9001 not available yet, Zzzzz...
2012-12-19 09:33:50,137 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 0 time(s).
2012-12-19 09:33:51,140 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 1 time(s).
2012-12-19 09:33:52,143 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 2 time(s).
2012-12-19 09:33:53,145 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 3 time(s).
2012-12-19 09:33:54,148 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 4 time(s).
2012-12-19 09:33:55,151 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 5 time(s).
2012-12-19 09:33:56,154 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 6 time(s).
2012-12-19 09:33:57,158 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 7 time(s).
2012-12-19 09:33:58,161 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 8 time(s).
2012-12-19 09:33:59,167 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 9 time(s).
2012-12-19 09:33:59,169 INFO org.apache.hadoop.ipc.RPC: Server at ipdiscovermaster.cloudapp.net/168.63.72.148:9001 not available yet, Zzzzz...
2012-12-19 09:34:01,173 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 0 time(s).
2012-12-19 09:34:02,175 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 1 time(s).
2012-12-19 09:34:03,178 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 2 time(s).
2012-12-19 09:34:04,181 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 3 time(s).
2012-12-19 09:34:05,183 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 4 time(s).
2012-12-19 09:34:06,189 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 5 time(s).
2012-12-19 09:34:07,191 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 6 time(s).
2012-12-19 09:34:08,193 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 7 time(s).
2012-12-19 09:34:09,195 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 8 time(s).
2012-12-19 09:34:10,196 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 9 time(s).
2012-12-19 09:34:10,199 INFO org.apache.hadoop.ipc.RPC: Server at ipdiscovermaster.cloudapp.net/168.63.72.148:9001 not available yet, Zzzzz...
2012-12-19 09:34:12,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ipdiscovermaster.cloudapp.net/168.63.72.148:9001. Already tried 0 time(s).

MASTER hosts file

#127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
#10.77.42.2 ipdiscovermaster.cloudapp.net
ipdiscoverreg1.cloudapp.net
#10.76.174.108 ipdiscoverreg1.cloudapp.net
ipdiscovermaster.cloudapp.net

MASTER core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://ipdiscovermaster.cloudapp.net:9000</value>
</property>
</configuration>

MASTER mapred-site.xml

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>ipdiscovermaster.cloudapp.net:9001</value>
</property>
</configuration>

MASTER masters file

ipdiscovermaster.cloudapp.net

MASTER slaves file

ipdiscovermaster.cloudapp.net
ipdiscoverreg1.cloudapp.net

SLAVE hosts file

#127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
#10.77.42.2 ipdiscovermaster.cloudapp.net
ipdiscoverreg1.cloudapp.net
ipdiscovermaster.cloudapp.net
#10.76.174.108 ipdiscoverreg1.cloudapp.net

SLAVE core-site.xml

    <configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://ipdiscovermaster.cloudapp.net:9000</value>
</property>
</configuration>

SLAVE mapred-site.xml

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>ipdiscovermaster.cloudapp.net:9001</value>
</property>
</configuration>

SLAVE masters file

ipdiscovermaster.cloudapp.net
user1900491
  • 61
  • 2
  • 3
  • maybe firewall ? I suggest using CDH with Cloudera manager, it'll save you a lot of time: http://www.cloudera.com/content/cloudera/en/products/cloudera-manager.html – wlk Dec 19 '12 at 10:49

3 Answers3

1

You need to check following possibilities

i Am amusing you have check log on Datanode ( 192.168.135.111 slave01) Which is best way go get exact error

If you have formatted nameNode

 i)delete temp data folder ..
 ii)recreate it 
 iii)give all the permission to temp folder
 iv)format namenode
 v)start hadoop cluster
Mr.Pramod Anarase
  • 1,454
  • 2
  • 15
  • 19
0

add the IP and hostname of the slave into the /etc/hosts file of the master machine and vice-versa. also, add dfs.data.dir and dfs.name.dir properties in your hdfs-site.xml file. these values default to /temp which gets emptied at restart. as a result you may loose information and face some problems on machine restart. make sure you have proper name resolution as this is really important for proper hadoop functioning.

Tariq
  • 34,076
  • 8
  • 57
  • 79
  • I have put the ips into both host files and added those parameters you spoke off but the error still remains. Any ideas? – user1900491 Dec 19 '12 at 11:33
0

I had similar problem with this. the logs just showing "retrying connect to server XXX". Here is what i did to solve this issue. Simply modify master & slave nodes /etc/hosts files particularly it's own hostname and corresponding IP. Dont bind hostname with 127.0.0.1:

original hosts file in master:

127.0.0.1  master

192.168.135.111 slave01

original hosts file in slave:

192.168.135.110 master

127.0.0.1 slave01

Resolved hosts file in master:

**192.168.135.110** master

192.168.135.111 slave

Resolve hosts file in slave:

192.168.135.110 master

**192.168.135.111** slave
sakura
  • 2,249
  • 2
  • 26
  • 39