4

We have a COSMOS account on cosmos.lab.fi-ware.org and can load files locally onto the cluster.

However, we are having trouble loading remotely, the instructions we followed on the guide site show the following:

However, using the WebHDFS/HttpFS RESTful API will allow you to upload files existing outside the global instance of Cosmos in FI-LAB. The following example uses HttpFS instead of WebHDFS (uses the TCP/14000 port instead of TCP/50070), and curl is used as HTTP client (but your applications should implement your own HTTP client):

[remote-vm]$ curl -i -X PUT "http://cosmos.lab.fi-ware.org:14000/webhdfs/v1/user/$COSMOS_USER/input_data?op=MKDIRS&user.name=$COSMOS_USER"
[remote-vm]$ curl -i -X PUT ..etc
[remote-vm]$ curl -i -X PUT -T etc..

As you can see, the data uploading is a two-step operation, as stated in the WebHDFS specification: the first invocation of the API talks directly with the Head Node, specifying the new file creation and its name; then the Head Node sends a temporary redirection response, specifying the Data Node among all the existing ones in the cluster where the data has to be stored, which is the endpoint of the second step. Nevertheless, the HttpFS gateway implements the same API but its internal behaviour changes, making the redirection to point to the Head Node itself.

However, when we run these commands we get server errors coming back, one example is:

~ kari$ -bash: user.name=kdempsey: command not found
HTTP/1.1 100 Continue

HTTP/1.1 401 Unauthorized
Server: Apache-Coyote/1.1
Set-Cookie: hadoop.auth=""; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Path=/
Content-Type: text/html;charset=utf-8
Content-Length: 1275
Date: Fri, 05 Jun 2015 12:58:20 GMT

Apache Tomcat/6.0.32 - Error report<!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}-->

HTTP Status 401 - org.apache.hadoop.security.authentication.client.AuthenticationException: Anonymous requests are disallowed

type Status report

message org.apache.hadoop.security.authentication.client.AuthenticationException: Anonymous requests are disallowed

description This request requires HTTP authentication (org.apache.hadoop.security.authentication.client.AuthenticationException: Anonymous requests are disallowed).

Apache Tomcat/6.0.32

Another was a 500 server error. Could please provide the commands for remotely loading a file into the COSMOS shared resource.

Ultimately we want to take data from our InfluxDB and load into COSMOS, we would like to do it via REST call if possible (otherwise python).

Many thanks, Kari

frb
  • 3,738
  • 2
  • 21
  • 51
karijd
  • 41
  • 2
  • Please, can you edit the question and put the complete command you are running? I can only see `~ kari$ -bash: user.name=kdempsey: command not found`, which on the one hand is missing most of the URL and on the other hand suggests me you had a carriage return after the first part of the URL. – frb Jun 09 '15 at 13:41
  • [remote-vm]$ curl -i -X PUT "http://cosmos.lab.fi-ware.org:14000/webhdfs/v1/user/$COSMOS_USER/input_data/unstructured_data.txt?op=CREATE&user.name=$COSMOS_USER" [remote-vm]$ curl -i -X PUT -T unstructured_data.txt --header "content-type: application/octet-stream" http://cosmos.lab.fi-ware.org:14000/webhdfs/v1/user/$COSMOS_USER/input_data/unstructured_data.txt?op=CREATE&user.name=$COSMOS_USER&data=true ---- it would not let me put more than 2 links, thanks for taking a look – karijd Jun 10 '15 at 14:24
  • I obviously edited the $COSMOS_USER to be my user – karijd Jun 10 '15 at 14:25

1 Answers1

0

As the roor user, I've tested your account and it works perfectly:

$ curl -i -X PUT "http://cosmos.lab.fi-ware.org:14000/webhdfs/v1/user/kdempsey/frbtest_deleteme?op=MKDIRS&user.name=kdempsey"
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Set-Cookie: hadoop.auth="u=kdempsey&p=kdempsey&t=simple&e=1434045807412&s=iFdK86PWTbJykXymYLS9qZcIE2g="; Version=1; Path=/
Content-Type: application/json
Transfer-Encoding: chunked
Date: Thu, 11 Jun 2015 08:03:27 GMT

{"boolean":true}
$ curl -i -X GET "http://cosmos.lab.fi-ware.org:14000/webhdfs/v1/user/kdempsey/?op=LISTSTATUS&user.name=kdempsey"
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Set-Cookie: hadoop.auth="u=kdempsey&p=kdempsey&t=simple&e=1434045881826&s=GkLYQ/BqnBNPFBNL3ZPwkxcwbx8="; Version=1; Path=/
Content-Type: application/json
Transfer-Encoding: chunked
Date: Thu, 11 Jun 2015 08:04:41 GMT

{"FileStatuses":{"FileStatus":[{"pathSuffix":"frbtest_deleteme","type":"DIRECTORY","length":0,"owner":"kdempsey","group":"kdempsey","permission":"755","accessTime":0,"modificationTime":1434009807428,"blockSize":0,"replication":0},{"pathSuffix":"input","type":"DIRECTORY","length":0,"owner":"kdempsey","group":"kdempsey","permission":"755","accessTime":0,"modificationTime":1433508554303,"blockSize":0,"replication":0},{"pathSuffix":"input_data","type":"DIRECTORY","length":0,"owner":"kdempsey","group":"kdempsey","permission":"755","accessTime":0,"modificationTime":1433508958231,"blockSize":0,"replication":0}]}}

As you can see, I've created a frbtest_deleteme folder, and then I've listed your HDFS userspace (/user/kdempsey) in order to get the list of subdirectories; among them, you'll find frbtest_deleteme.

frb
  • 3,738
  • 2
  • 21
  • 51
  • Thanks frb, what would be the command for loading a single file please? Also, is there a way to make this more secure? At the moment it seems anyone could call the command you posted, is that correct? if not what is the login command please - perhaps it was this that was wrong. – karijd Jun 16 '15 at 12:02
  • Hi frb, it works if I am not using a command which gives a redirect but when using a redirect it does not, e.g. a CREATE command. It does not seem to return a different location for the data node, could you please advise on the command to get the file load working please? thanks, – karijd Jun 22 '15 at 20:19
  • 1
    For others trying to do this, this worked to avoid the redirect: curl -X PUT -L -b test.txt "http://cosmos.lab.fi-ware.org:14000/webhdfs/v1/user/kdempsey/frbtest_deleteme/test3.txt?op=CREATE&user.name=kdempsey&data=true" --header "Content-Type:application/octet-stream" --header "Transfer-Encoding:chunked" -T "test.txt" – karijd Jun 22 '15 at 21:01
  • Regarding the redirection, we have deployed a HttpFS server listening on port 14000 instead of the default HDFS service listening on 50070. Such a HttpFS server works as a gateway, allowing us to have a single IP address instead of one IP address per node (which it is costly). That's the reason you always see the same node in the redirections. – frb Jun 23 '15 at 07:29