Calibre MTflex PERC LOAD remote host acquisition discrepancy.

2023-08-07T12:24:59.000-0400
IC Verification & Signoff

Summary

PERC LOAD MTflex run fails in various stages of the PERC flow. All MTflex cases are affected, other non-MTflex running as expected.


Details

Symptoms:

When running “PERC LOAD” MTFlex job run fails during different stages inconsistently.

The tail of the log file will show an MTFLEX RUNTIME ERROR similar as the below:
MTFLEX RUNTIME ERROR: release_remote() : Release remote : sj-sc9-s22 (pid=169686) FAIL, ID = 235 TIMESTAMP = 339
<REMOVE>
  <CPU_CHANGED>6</CPU_CHANGED>
</REMOVE>
<DCA 340 0 9 N sj-sc9-s22 ....>

Fatal error (signal SIGSEGV) - crash handler invoked

Stack trace:
/lib64/libpthread.so.0(+0xf630)[0xf033ca87630]
....

Root cause:

When you acquire remotes for P2P/CD runs with Calibre PERC, the remote command (usually found in the launch remote script) looks something like this:
rcalibre /proj/out 0 -mtflex $CALIBRE_REMOTE_CONNECTION -64
rcalibre /proj/out 0 -mtflex $CALIBRE_REMOTE_CONNECTION -64
...
rcalibre /proj/out 0 -mtflex $CALIBRE_REMOTE_CONNECTION -64 -f

The root cause is a tool bug in the "PERC LOAD" with MTflex that causes the remotes to be lost when there is a job running in the background and can't bring it again in the foreground.

Solution:

  1. Bug fixed starting from Calibre version 2023.3 and beyond. PRESERVE REMOTES is now enabled by default, so users can run with the same remote acquisition command for both P2P/CD MTflex and MTflex of PERC LOADs.
     
  2. Workaround by adding "-f &" for every line in the remote command and "wait" at the end.
    rcalibre /proj/out 0 -mtflex $CALIBRE_REMOTE_CONNECTION -64 -f &
    rcalibre /proj/out 0 -mtflex $CALIBRE_REMOTE_CONNECTION -64 -f &
    ...
    rcalibre /proj/out 0 -mtflex $CALIBRE_REMOTE_CONNECTION -64 -f &
    wait

KB Article ID# KB000121445_EN_US

Contents

SummaryDetails

Associated Components

Calibre PERC