Dear RegCNeters,

RegCM-4.4.5.4 seg-faults when I run the model with more than 85 processors. The following error lines repeat as many times as their are processors selected....

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
regcmMPICLM        0000000001325431  Unknown               Unknown  Unknown
regcmMPICLM        0000000001323B87  Unknown               Unknown  Unknown
regcmMPICLM        00000000012D3124  Unknown               Unknown  Unknown
regcmMPICLM        00000000012D2F36  Unknown               Unknown  Unknown
regcmMPICLM        000000000126875F  Unknown               Unknown  Unknown
regcmMPICLM        000000000127152D  Unknown               Unknown  Unknown
libpthread.so.0    00007FFFF58B3790  Unknown               Unknown  Unknown
libmpi.so.12       00007FFFF6158BBC  Unknown               Unknown  Unknown
libmpi.so.12       00007FFFF6143FFF  Unknown               Unknown  Unknown
libmpi.so.12       00007FFFF5FAA6E1  Unknown               Unknown  Unknown
libmpi.so.12       00007FFFF6134563  Unknown               Unknown  Unknown
libmpi.so.12       00007FFFF60F67F0  Unknown               Unknown  Unknown
libmpi.so.12       00007FFFF60E9B74  Unknown               Unknown  Unknown
libmpifort.so.12   00007FFFF6648160  Unknown               Unknown  Unknown
regcmMPICLM        000000000074E2B0  Unknown               Unknown  Unknown
regcmMPICLM        0000000000420B8E  Unknown               Unknown  Unknown
libc.so.6          00007FFFF552ED5D  Unknown               Unknown  Unknown
regcmMPICLM        0000000000420A99  Unknown               Unknown  Unknown

I am running the model on a single node with 256 cores (32 sockets of 8 cores) at 12-km res (ds) over a relatively small domain (Puerto Rico) with the following settings:

iy = 64
jx = 80
kz = 18

I attempted several runs ranging from procs of 86 - 192, all of which yielded the above message * the number of processors. I was sure to select procs that were easily divisible by the iy and jx values above and that met the 3x3 box per processor minimum, e.g., 128 yields:

CPUS DIM1 = 16, divided into 80 (jx) = 5 (> 3)
CPUS DIM2 = 8, divided into 64 (iy) = 8  (> 3)

Any ideas? Is this an issue with how I set up the model or the single-node system I'm using?

On a related note, the documentation (version 4.4) states.....

"In the current version 4.4 the model parallelizes execution dividing the
work between the processors, with the minimum work per processor is 9
points or a box 3 * 3, so the maximum number of processors which can be
used in a parallel run for the above configuration [iy=34, jx=48] is roughly 180."

I arrive at 180(ish) by multiplying 34 * 48 and dividing by 9. However, when I do the same for my domain above (64 * 80 / 9), I get 569. I am missing something?

Many thanks.

Best,
Alex