5

I'm running Yarn on Oracle BDA X7-2, specs:

  • Cloudera Enterprise 5.14.3
  • Java 1.8.0_171
  • PGX 2.7.1

I'm trying to run PGX on Yarn following this manual: https://docs.oracle.com/cd/E56133_01/2.5.0/tutorials/yarn.html

Managed to run the installation script, completed the config file provided by it with the following:

{
  "pgx_yarn_jar_hdfs_path": "hdfs:/user/pgx/pgx-yarn-2.7.1.jar",
  "pgx_war_hdfs_path": "hdfs:/user/pgx/pgx-webapp-2.7.1.war",
  "pgx_conf_hdfs_path": "hdfs:/user/pgx/pgx.conf",
  "pgx_log4j_conf_hdfs_path": "hdfs:/user/pgx/log4j2.xml",
  "pgx_dist_log4j_conf_hdfs_path": "hdfs:/user/pgx/dist_log4j.xml",
  "pgx_cluster_host_hdfs_path": "hdfs:/user/pgx/cluster-host.tgz",
  "zookeeper_connect_string": "bda1node05,bda1node06,bda1node07",
  "standard_library_path": "/usr/lib64/gcc/4.8.2",
  "min_heap_size": "512m",
  "max_heap_size": "12g",
  "container_cores": 9,
  "container_memory": 0,
  "container_priority": 0,
  "num_machines": 1
}

Yarn has a pgx-service application in RUNNING state, no errors in stderr, the log shows me the service is running in the address:

http://bda1node06:7007

And the linux Java process is running with the following command:

/usr/java/default/bin/java -Xms512m -Xmx12g oracle.pgx.yarn.PgxService bda1node06 /u11/hadoop/yarn/nm/usercache/root/appcache/application_1539869144089_2070/container_e22_1539869144089_2070_01_000002/pgx-server.war 7007 bda1node05,bda1node06,bda1node07 /pgx-8eef44e2-1657-403a-8193-0102f5266680

And after the execution of the PGX client for testing purposes:

$PGX_HOME/bin/pgx --base_url http://bda1node06:7007

I get:

java.util.concurrent.ExecutionException: java.lang.IllegalStateException: cannot connect to server; requested http://bda1node06:7007/version?extendedInfo=true and expected status 200, got 404 instead; response body = ""
    at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
    at oracle.pgx.api.PgxFuture.get(PgxFuture.java:99)
    at oracle.pgx.api.ServerInstance.createSession(ServerInstance.java:559)
    at oracle.pgx.shell.Console.initSession(Console.java:280)
    at oracle.pgx.shell.Console.(Console.java:153)
    at oracle.pgx.shell.Console.main(Console.java:296)
Caused by: java.lang.IllegalStateException: cannot connect to server; requested http://bda1node06:7007/version?extendedInfo=true and expected status 200, got 404 instead; response body = ""
    at oracle.pgx.api.ClientApiProvider.lambda$versionCheck$2(ClientApiProvider.java:189)
    at oracle.pgx.client.RemoteUtils.lambda$asyncRequest$5(RemoteUtils.java:278)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

I have no idea of how to debug and check if there's any extra path needed in the connection URL.

How may I proceed to debug?

Thanks in advance!

Samamba
  • 113
  • 9
  • is there any useful output when running `yarn logs -applicationId ` ? – Korbi Nov 16 '18 at 20:33
  • 1
    another thing you can try is playing with container_cores and container_memory setting in yarn.conf - try to set them to a small value to make sure YARN doesn't request more capacity than is available, which could cause the service to never be deployed. I think setting it to 0 means maximum available cores/CPU capacity – Korbi Nov 20 '18 at 19:52
  • @Korbi thank you so much for you considerations. Sorry for the delay in the response, I was busy in the last weeks and was not following this thread, I apologize for that. Today we had a meeting with Adriano (from Oracle Brazil) and Albert (Oracle PO), I think you may know them. I'll check your considerations tomorrow first thing in the morning. Thanks in advance! – Samamba Dec 06 '18 at 20:13
  • Just to confirm: when you start the PGX server manually - using the pgx/bin/start-server script, does the server start successfully ? And are you then able to connect from the client, when running it on the BDA too ? – Albert Godfrind Dec 07 '18 at 18:13
  • @AlbertGodfrind following our meeting, had just done what you recomended: opened the groovy shell, connected to the hbase datatabase, created some vertices, created some edges, successfully instantiated a pgx analyst, did some basic operations (count triangles, etc), everything worked fine. Edited the conf/server.conf file, disabled tls and authentication, started the PGX server and it seems to be running and listening to the 7007 port and now i'm strugling a little to connect without ssl with the bin/pgx client. Everything on the oracle BDA – Samamba Dec 10 '18 at 19:45
  • Since i'm not authenticated and running without tls, i'll be working on rest api to do some tests. Based on this documentation https://docs.oracle.com/en/bigdata/big-data-spatial-graph/2.5/bdspa/using-property-graphs-big-data.html#GUID-849F7E7D-E206-4ED6-8DE9-CDB9F1D0FE1E i'm trying to do some requests, get back to you guys when I have news – Samamba Dec 10 '18 at 20:08

2 Answers2

2

By default, PGX has a base path of /pgx, which means you should connect as follows:

$PGX_HOME/bin/pgx --base_url http://bda1node06:7007/pgx
Martijn
  • 5,491
  • 4
  • 33
  • 41
  • I get a similar response: $PGX_HOME/bin/pgx --base_url http://bda1node06:7007/pgx java.util.concurrent.ExecutionException: java.lang.IllegalStateException: cannot connect to server; requested http://bda1node06:7007/pgx/version?extendedInfo=true and expected status 200, got 404 instead; response body = "" – Samamba Nov 13 '18 at 17:07
0

I'll do a little follow up here.

We've managed to start a pgx server and manipulate hbase graph! :D

PGX "Hello World"

We wrote a small code to insert vertices, edgex, instantiate pgx and run a simple example, this is it:

cfg = GraphConfigBuilder.forPropertyGraphHbase().setName('sinapse').setZkQuorum('bda1node05').build()
opg = OraclePropertyGraph.getInstance(cfg)
​
a = opg.addVertex()
a.setProperty('nome', 'Felipe')
​
b = opg.addVertex()
b.setProperty('nome', 'Rhenan')
​
c = opg.addVertex()
c.setProperty('nome', 'Hugo')
​
opg.addEdge(a, b, 'Pai de')
opg.addEdge(b, c, 'Pai de')
opg.addEdge(a, c, 'Avo de')

opg.commit()
​
session = Pgx.createSession('sinapsepgx')
analyst = session.createAnalyst()
pgxGraph = session.readGraphWithProperties(opg.getConfig(), true)
analyst.countTriangles(pgxGraph, true)

And that worked just fine!

Client - Server architecture

The next step, we moved to a client/server mode, starting the start-server script. We managed to do that just fine too! This is our config files:

server.conf

{
  "port": 7007,
  "enable_tls": false,
  "enable_client_authentication": false
}

pgx.conf

{
    "allow_idle_timeout_overwrite": true,
    "allow_local_filesystem": false,
    "allow_task_timeout_overwrite": true,
    "enable_gm_compiler": true,
    "enterprise_scheduler_config": {
      "analysis_task_config": {
        "priority": "MEDIUM",
        "weight": 12,
        "max_threads": 12
      },
      "fast_analysis_task_config": {
        "priority": "HIGH",
        "weight": 1,
        "max_threads": 12
      },
      "num_io_threads_per_task": 12
    },
    "preload_graphs": [
        {"path": "graphs/sinapse_conf.json",
         "name": "sinapse"}
    ],
    "max_active_sessions": 1024,
    "max_queue_size_per_session": -1,
    "max_snapshot_count": 0,
    "memory_cleanup_interval": 600,
    "path_to_gm_compiler": null,
    "release_memory_threshold": 0.85,
    "session_idle_timeout_secs": 0,
    "session_task_timeout_secs": 0,
    "strict_mode": true,
    "tmp_dir": "/tmp"
  }

sinapse_conf.json

{
  "edge_props": [
    {
      "name": "relacao",
      "type": "string"
    }
  ],
  "db_engine": "HBASE",
  "vertex_props": [
    {
      "name": "nome",
      "type": "string"
    },
    {
      "name": "cpf",
      "type": "string"
    }
  ],
  "format": "pg",
  "name": "sinapse",
  "error_handling": {},
  "vertex_id_type": "long",
  "attributes": {},
  "loading": {},
  "zk_quorum": "bda1node05,bda1node06,bda1node07"

}

start-script ran just fine with that, preloaded our hbase graph, works like a charm.

Connected to the server using the pgx client:

./bin/pgx -b http://localhost:7007

And managed to do the same we did in the groovy shell. That's awesome.

PGX on Yarn

Well, now we are back in our challenge: run and manage PGX on Yarn.

We've copied our pgx.conf file to the hdfs, like this:

hdfs://user/pgx/pgx.conf

{
    "allow_idle_timeout_overwrite": true,
    "allow_local_filesystem": false,
    "allow_task_timeout_overwrite": true,
    "enable_gm_compiler": true,
    "enterprise_scheduler_config": {
      "analysis_task_config": {
        "priority": "MEDIUM",
        "weight": 12,
        "max_threads": 12
      },
      "fast_analysis_task_config": {
        "priority": "HIGH",
        "weight": 1,
        "max_threads": 12
      },
      "num_io_threads_per_task": 12
    },
    "preload_graphs": [
        {"path": "graphs/sinapse_conf.json",
         "name": "sinapse"}
    ],
    "max_active_sessions": 1024,
    "max_queue_size_per_session": -1,
    "max_snapshot_count": 0,
    "memory_cleanup_interval": 600,
    "path_to_gm_compiler": null,
    "release_memory_threshold": 0.85,
    "session_idle_timeout_secs": 0,
    "session_task_timeout_secs": 0,
    "strict_mode": true,
    "tmp_dir": "/tmp"
  }

/opt/oracle/oracle-spatial-graph/property_graph/pgx/yarn/conf/yarn.conf

{
  "pgx_yarn_jar_hdfs_path": "hdfs://mpmapas-ns/user/pgx/pgx-yarn-2.7.1.jar",
  "pgx_war_hdfs_path": "hdfs://mpmapas-ns/user/pgx/pgx-webapp-2.7.1.war",
  "pgx_conf_hdfs_path": "hdfs://mpmapas-ns/user/pgx/pgx.conf",
  "pgx_log4j_conf_hdfs_path": "hdfs://mpmapas-ns/user/pgx/log4j2.xml",
  "pgx_dist_log4j_conf_hdfs_path": "hdfs://mpmapas-ns/user/pgx/dist_log4j.xml",
  "pgx_cluster_host_hdfs_path": "hdfs://mpmapas-ns/user/pgx/cluster-host.tgz",
  "zookeeper_connect_string": "bda1node05.pgj.rj.gov.br,bda1node06.pgj.rj.gov.br,bda1node07.pgj.rj.gov.br",
  "standard_library_path": "/usr/lib64/gcc/4.8.2",
  "min_heap_size": "512m",
  "max_heap_size": "12g",
  "container_cores": 9,
  "container_memory": 0,
  "container_priority": 0,
  "num_machines": 1
}

Also, @albert recomended us to remove the log4j2.xml from the server/shared-mem/pgx-webapp-2.7.1.war file so we may handle log4j logging using only the file placed on our hdfs folder.

So we've unpacked, removed, repacked the war file, edited the log4j2.xml file on hdfs like this:

hdfs://user/pgx/log4j2.xml

<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN">
    <Appenders>
        <Console name="Console" target="SYSTEM_OUT">
            <PatternLayout pattern="%d{HH:mm:ss,SSS} %p %C{1} - %m%n"/>
        </Console>
                <File name="LogFile" fileName="file:/tmp/pg_trace.log">
                        <PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg%n"/>
                </File>
    </Appenders>
    <Loggers>
        <Root level="debug">
            <AppenderRef ref="LogFile"/>
        </Root>
        <Logger name="oracle.pgx.engine.admin.Ctrl" level="debug">
            <AppenderRef ref="LogFile"/>
        </Logger>
        <Logger name="pgx.dist.cluster_host" level="debug">
            <AppenderRef ref="LogFile"/>
        </Logger>
    </Loggers>
</Configuration>

And finally ran the yarn start server command, just like this:

yarn jar yarn/pgx-yarn-2.7.1.jar yarn/conf/yarn.conf

And we get the bottom of the logfile that seems realy nice!:

18/12/11 16:25:03 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
18/12/11 16:25:03 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
18/12/11 16:25:03 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
18/12/11 16:25:03 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
18/12/11 16:25:03 INFO zookeeper.ZooKeeper: Client environment:os.version=4.1.12-124.14.1.el7uek.x86_64
18/12/11 16:25:03 INFO zookeeper.ZooKeeper: Client environment:user.name=root
18/12/11 16:25:03 INFO zookeeper.ZooKeeper: Client environment:user.home=/root
18/12/11 16:25:03 INFO zookeeper.ZooKeeper: Client environment:user.dir=/opt/oracle/oracle-spatial-graph/property_graph/pgx
18/12/11 16:25:03 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=bda1node05.pgj.rj.gov.br,bda1node06.pgj.rj.gov.br,bda1node07.pgj.rj.gov.br sessionTimeout=10000 watcher=oracle.pgx.yarn.ClientZkClient@32da97fd
18/12/11 16:25:03 INFO zookeeper.ClientCnxn: Opening socket connection to server bda1node07.pgj.rj.gov.br/192.168.8.7:2181. Will not attempt to authenticate using SASL (unknown error)
18/12/11 16:25:03 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.8.5:33299, server: bda1node07.pgj.rj.gov.br/192.168.8.7:2181
18/12/11 16:25:03 INFO zookeeper.ClientCnxn: Session establishment complete on server bda1node07.pgj.rj.gov.br/192.168.8.7:2181, sessionid = 0x3668759ae4553df, negotiated timeout = 10000
18/12/11 16:25:05 INFO yarn.StartService: waiting for PGX service (yarn appId == 'application_1539869144089_2555') to come up ...
18/12/11 16:25:10 INFO yarn.StartService: retrieved PGX host: http://bda1node07.pgj.rj.gov.br:7007
18/12/11 16:25:10 INFO yarn.StartService: to connect a remote shell to this host, run '$PGX_HOME/bin/pgx --base_url http://bda1node07.pgj.rj.gov.br:7007'
18/12/11 16:25:10 INFO yarn.StartService: to shut the PGX service down, run 'yarn application -kill application_1539869144089_2555'
18/12/11 16:25:10 INFO zookeeper.ZooKeeper: Session: 0x3668759ae4553df closed
18/12/11 16:25:10 INFO zookeeper.ClientCnxn: EventThread shut down

But connecting to it still returns 404 ;(

The last intel I may give you is the yarn stderr log, wich also informs that we are not using log4j correctly:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/u09/hadoop/yarn/nm/filecache/890/pgx-yarn-2.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. Set system property 'log4j2.debug' to show Log4j2 internal initialization logging.
18/12/11 16:25:06 INFO yarn.AppMaster: register app
18/12/11 16:25:06 INFO yarn.AppMaster: RM response = [queue=root.users.root,maxCap=<memory:65536, vCores:9>]
18/12/11 16:25:06 INFO yarn.AppMaster: max capability of cluster: <memory:65536, vCores:9>
18/12/11 16:25:06 INFO yarn.AppMaster: attempting to allocate 1 containers
18/12/11 16:25:06 INFO yarn.AppMaster: attempt 1: got 0 containers. Available: <memory:194560, vCores:180>
18/12/11 16:25:06 INFO yarn.AppMaster: attempt 2: got 0 containers. Available: <memory:194560, vCores:180>
18/12/11 16:25:06 INFO yarn.AppMaster: attempt 3: got 1 containers. Available: <memory:129024, vCores:171>
18/12/11 16:25:06 INFO yarn.AppMaster: copy hdfs://mpmapas-ns/user/pgx/pgx-yarn-2.7.1.jar into pgx-yarn.jar
18/12/11 16:25:06 INFO yarn.AppMaster: copy hdfs://mpmapas-ns/user/pgx/pgx-webapp-2.7.1.war into pgx-server.war
18/12/11 16:25:06 INFO yarn.AppMaster: copy hdfs://mpmapas-ns/user/pgx/pgx.conf into conf/pgx.conf
18/12/11 16:25:06 INFO yarn.AppMaster: copy hdfs://mpmapas-ns/user/pgx/log4j2.xml into conf/log4j2.xml
18/12/11 16:25:07 INFO yarn.AppMaster: server env = {CLASSPATH=conf:pgx-server/WEB-INF/lib/*:pgx-yarn.jar:$HADOOP_CONF_DIR}
18/12/11 16:25:07 INFO yarn.AppMaster: server command = $JAVA_HOME/bin/java -Xms512m -Xmx12g oracle.pgx.yarn.PgxService bda1node07.pgj.rj.gov.br $PWD/pgx-server.war 7007 bda1node05.pgj.rj.gov.br,bda1node06.pgj.rj.gov.br,bda1node07.pgj.rj.gov.br /pgx-37a121ce-e028-432c-8761-104027126c3b 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr; 
18/12/11 16:25:07 INFO yarn.AppMaster: check for completion
18/12/11 16:25:08 INFO yarn.AppMaster: check for completion
18/12/11 16:25:08 INFO yarn.AppMaster: check for completion
18/12/11 16:25:09 INFO yarn.AppMaster: check for completion
18/12/11 16:25:09 INFO yarn.AppMaster: check for completion
18/12/11 16:25:10 INFO yarn.AppMaster: check for completion
18/12/11 16:25:10 INFO yarn.AppMaster: check for completion
18/12/11 16:25:11 INFO yarn.AppMaster: check for completion
18/12/11 16:25:11 INFO yarn.AppMaster: check for completion
18/12/11 16:25:12 INFO yarn.AppMaster: check for completion
.
.
.

This is the farthest we've managed to go.

We can start our work now! That's realy exciting. Now I know how to properly start a service, preload, insert, manage data, and we will import our existing graph database to it and do some experimentation.

Would be lovely to have this running on Yarn at the production level.

Thank you all for the extreme dedication and attention.

Samamba
  • 113
  • 9