0

Yarn version: 3.1.1 HDP version: 3.1.5

Permissions are fine on the /var/log/ directory itself. (Even tried 777 to ensure it could write, the error still happens)

Disk space is also fine - maybe is a connectivity issue to the disk? (although its the root volume, so not sure how that would happen)

I can restart the nodemanagers manually and they will proceed running any jobs as usual without complaining - is there a way for the restart to trigger automatically if a nodemanager is found unhealthy? (this might be a good workaround)

2023-08-30 20:51:57,534 ERROR recovery.NMLeveldbStateStoreService (NMLeveldbStateStoreService.java:markStoreUnHealthy(206)) - Statestore exception: 
org.iq80.leveldb.DBException: IO error: /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/035781.log: Permission denied
    at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:129)
    at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:106)
    at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeMasterKey(NMLeveldbStateStoreService.java:1112)
    at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeContainerTokenPreviousMasterKey(NMLeveldbStateStoreService.java:1178)
    at org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager.updatePreviousMasterKey(NMContainerTokenSecretManager.java:120)
    at org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager.setMasterKey(NMContainerTokenSecretManager.java:141)
    at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$StatusUpdaterRunnable.updateMasterKeys(NodeStatusUpdaterImpl.java:1255)
    at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$StatusUpdaterRunnable.run(NodeStatusUpdaterImpl.java:1099)
    at java.lang.Thread.run(Thread.java:750)
Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /var/log/hadoop-yarn/nodemanager/recovery-state/yarn-nm-state/035781.log: Permission denied
    at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
    at org.fusesource.leveldbjni.internal.NativeDB.put(NativeDB.java:259)
    at org.fusesource.leveldbjni.internal.NativeDB.put(NativeDB.java:254)
    at org.fusesource.leveldbjni.internal.NativeDB.put(NativeDB.java:244)
    at org.fusesource.leveldbjni.internal.JniDB.put(JniDB.java:126)
    ... 8 more

0 Answers0