PERC LOAD MTflex run fails in various stages of the PERC flow. All MTflex cases are affected, other non-MTflex running as expected.
When running “PERC LOAD” MTFlex job run fails during different stages inconsistently.
The tail of the log file will show an MTFLEX RUNTIME ERROR similar as the below:
MTFLEX RUNTIME ERROR: release_remote() : Release remote : sj-sc9-s22 (pid=169686) FAIL, ID = 235 TIMESTAMP = 339
<REMOVE>
<CPU_CHANGED>6</CPU_CHANGED>
</REMOVE>
<DCA 340 0 9 N sj-sc9-s22 ....>
Fatal error (signal SIGSEGV) - crash handler invoked
Stack trace:
/lib64/libpthread.so.0(+0xf630)[0xf033ca87630]
....
When you acquire remotes for P2P/CD runs with Calibre PERC, the remote command (usually found in the launch remote script) looks something like this:
rcalibre /proj/out 0 -mtflex $CALIBRE_REMOTE_CONNECTION -64
rcalibre /proj/out 0 -mtflex $CALIBRE_REMOTE_CONNECTION -64
...
rcalibre /proj/out 0 -mtflex $CALIBRE_REMOTE_CONNECTION -64 -f
The root cause is a tool bug in the "PERC LOAD" with MTflex that causes the remotes to be lost when there is a job running in the background and can't bring it again in the foreground.