5

I would like to be able to connect my Mac to my EC2 instances to carry out parallel processing on AWS via the parallel package using makePSOCKcluster or makeSOCKCluster.

At the moment my attempt leaves R 'hanging' so I have adapted the the makePSOCKcluster, and some of its sub-routines such that some of its outputs can be seen by adding a -v option to the ssh. I think I have managed to deal with the password-less ssh login, but am getting stuck at the the socketConnection part which I think is causing problems.

I have tried associating Elastic IPs and using those as the IP addresses to no avail, have also tried adjusting the security groups so that it includes the default port that makePSOCKcluster uses to no avail either...In the latter case, I did not use the ports argument and used the default port option of 10187 to which it says:

Error in socketConnection("localhost", port = port, server = TRUE, blocking = TRUE,  : 
  cannot open the connection
In addition: Warning message:
In socketConnection("localhost", port = port, server = TRUE, blocking = TRUE,  :
  port 10187 cannot be opened

Looking on a few hpc mailing list questions this seemed to be an issue related to being windows...but given I am connecting from a Mac I don't think I fall under this category.

The hosts object is just the public DNS that is provided after launching an EC2 instance.

The below is my current attempt having adapted some of the makePSOCKcluster function to makePSOCKcluster1 they should be roughly the same.

I have specified the rscript parameter to match that which would be expected on an ubuntu instance, and I have specified the ubuntu as the username to login as on the ubuntu EC2 instance.

makePSOCKcluster1 <- function (names, ...) {
    if (is.numeric(names)) 
        names <- rep("localhost", names[1])
    options <- parallel:::addClusterOptions(parallel:::defaultClusterOptions, list(...))
    cl <- vector("list", length(names))
    for (i in seq_along(cl)) cl[[i]] <- newPSOCKnode1(names[[i]], 
        options = options, rank = i)
    class(cl) <- c("SOCKcluster", "cluster")
    cl
}



newPSOCKnode1 <- function (machine = "localhost", ..., options = parallel:::defaultClusterOptions, 
                           rank) 
{
    options <- options
    if (is.list(machine)) {
        options <- options
        machine <- machine$host
    }
    outfile <- parallel:::getClusterOption("outfile", options)
    master <- if (machine == "localhost") 
        "localhost"
    else parallel:::getClusterOption("master", options)
    port <- parallel:::getClusterOption("port", options)
    manual <- parallel:::getClusterOption("manual", options)
    timeout <- parallel:::getClusterOption("timeout", options)
    methods <- parallel:::getClusterOption("methods", options)
    useXDR <- parallel:::getClusterOption("useXDR", options)
    env <- paste("MASTER=", master, " PORT=", port, " OUT=", 
                 outfile, " TIMEOUT=", timeout, " METHODS=", methods, 
                 " XDR=", useXDR, sep = "")
    arg <- "parallel:::.slaveRSOCK()"
    rscript <- if (parallel:::getClusterOption("homogeneous", options)) {
        shQuote(parallel:::getClusterOption("rscript", options))
    }
    else "Rscript"
    cmd <- paste(rscript, "-e", shQuote(arg), env)
    renice <- parallel:::getClusterOption("renice", options)
    if (!is.na(renice) && renice) 
        cmd <- sprintf("nice +%d %s", as.integer(renice), cmd)
    if (manual) {
        cat("Manually start worker on", machine, "with\n    ", 
            cmd, "\n")
        flush.console()
    }
    else {
        if (machine != "localhost") {
            rshcmd <- parallel:::getClusterOption("rshcmd", options)
            user <- parallel:::getClusterOption("user", options)
            cmd <- shQuote(cmd)
            cmd <- paste(rshcmd, "-v -l", user, machine, cmd)
            print(cmd)
        }
        if (.Platform$OS.type == "windows") {
            system(cmd, wait = FALSE, input = "")
        }
        else system(cmd, wait = FALSE)
    }
    print("ssh done!!! about to start socketConnection....")
    con <- socketConnection("localhost", port = port, server = TRUE, 
                            blocking = TRUE, open = "a+b", timeout = timeout)
    print("socketConnection complete!!!")
    structure(list(con = con, host = machine, rank = rank), class = if (useXDR) 
        "SOCKnode"
              else "SOCK0node")
}



 hosts <- c("ec2-xxx-xx-xxx-xxxx.zone.compute.amazonaws.com","ec2-xx-xxx-xxx-xxx.zone.compute.amazonaws.com")
 # the code to try and connect to the actual EC2 instance...
 cl1 <- makePSOCKcluster1(hosts, user="ubuntu", rscript="/usr/lib/R/bin/Rscript", port=8787)



[1] "ssh -v -l ubuntu ec2-xxxxxxxxxxx.zone.compute.amazonaws.com \"'/usr/lib/R/bin/Rscript' -e 'parallel:::.slaveRSOCK()' MASTER=local.machine.name PORT=8787 OUT=/dev/null TIMEOUT=2592000 METHODS=TRUE XDR=TRUE\""
[1] "ssh done!!! about to start socketConnection...."
OpenSSH_5.2p1, OpenSSL 0.9.8r 8 Feb 2011
debug1: Reading configuration data /etc/ssh_config
debug1: Connecting to ec2-xxxxxxxxxxx.zone.compute.amazonaws.com [xx.xxx.xx.x.x] port 22.
debug1: Connection established.
debug1: identity file /Users/username/.ssh/identity type -1
debug1: identity file /Users/username/.ssh/id_rsa type 1
debug1: identity file /Users/username/.ssh/id_dsa type 2
debug1: Remote protocol version 2.0, remote software version OpenSSH_5.9p1 Debian-5ubuntu1
debug1: match: OpenSSH_5.9p1 Debian-5ubuntu1 pat OpenSSH*
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_5.2
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: server->client aes128-ctr hmac-md5 none
debug1: kex: client->server aes128-ctr hmac-md5 none
debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP
debug1: SSH2_MSG_KEX_DH_GEX_INIT sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY
Warning: Permanently added 'ec2-xx-xx-xxx-xxx-xx.ap-southeast-1.compute.amazonaws.com,xx.xxx.xxx.xx.x' (RSA) to the list of known hosts.
debug1: ssh_rsa_verify: signature correct
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey
debug1: Next authentication method: publickey
debug1: Offering public key: /Users/username/.ssh/id_rsa
debug1: Server accepts key: pkalg ssh-rsa blen 277
debug1: Authentication succeeded (publickey).
debug1: channel 0: new [client-session]
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug1: Sending command: '/usr/lib/R/bin/Rscript' -e 'parallel:::.slaveRSOCK()' MASTER=local.machine.name PORT=8787 OUT=/dev/null TIMEOUT=2592000 METHODS=TRUE XDR=TRUE
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug1: channel 0: free: client-session, nchannels 1
debug1: fd 0 clearing O_NONBLOCK
debug1: fd 1 clearing O_NONBLOCK
debug1: fd 2 clearing O_NONBLOCK
Transferred: sent 2352, received 2400 bytes, in 20.0 seconds
Bytes per second: sent 117.5, received 119.9
debug1: Exit status 1

I am using a Mac on OS X 10.6.8 to connect to a Debian Ubunutu instance. If there are better ways to connect to the EC2 instances to carry out parallel processing that people know of that would also be extremely useful.

The end goal is to use foreach to carry out this processing once the cluster has been registered.

Also as a side question I was wondering, what are the speed/processing pros and cons associated with carrying out a process in parallel vs using MPI? or some other method?

Thanks in advance!

EDIT I have managed to get the makePSOCKcluster to work if starting from a separate EC2 instance, and the parLapply function works, and I can even register using registerDoParallel(cl1) where cl1 is the cluster object, but for some reason foreach... %dopar% does not work...giving the error:

Error in serialize(data, node$con) : error writing to connection

or

Error in unserialize(node$con) : error reading from connection

the connections seem ok when looking at showConnections() with this as the following output:

> showConnections()
  description                                                 class      mode  text     isopen   can read can write
3 "<-ip-xx-xxxx-x-xxx.zone.compute.internal:10187"   "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
4 "<-ip-yy-yyyy-y-yyyy.zone.compute.internal:10187" "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
5 "<-ip-zz-zzzz-z-zzzz.zone.compute.internal:10187"  "sockconn" "a+b" "binary" "opened" "yes"    "yes"    
> 

where x,y & z represent the different ip addresses....The foreach examples come directly out out the examples given in the help files of foreach and furthermore some of the clusterCall/clusterExport/clusterEvalQ functions from the parallel package do not work either... giving a similar error message as before....

I would still like to be able to connect from a Mac...but would also still like to be able to use foreach to carry out parallel processing...hope the extra info helps

h.l.m
  • 13,015
  • 22
  • 82
  • 169

0 Answers0