[RegCNET] an apparently random mpi abort

Gazi Faruq gfaruq04 at gmail.com
Mon Feb 4 16:03:34 CET 2013


Dear Graziano and James,
With RegCM-4.3.5.4 I also faced the problem when submitting the job for 32
cores in 4 nodes. The regcmMPI compiled by ifort and openmpi-1.6.3 dumped
to single node (first node), showing the load increases on the node and
finally aborted.

To solve the above problem, I tried back to RegCM-4.3.5.2 and found it is
working properly on 4 nodes.

But there are other issues for choosing schmes:

For larger domain the model stops after 1 year with icup=4, icup=99 and
icup=2 options.

Pls have a look on these issues

With regards

G Faruq


On Thu, Jan 31, 2013 at 7:35 PM, James Ciarlo` <james.ciarlo at physics.org>wrote:

> Dear RegCNET,
>
> I have been running several test simulations with RegCM4 using mpi.
> However, in many cases, the simulation never even started and aborted very
> early on.
>
> The people in charge of our computer cluster suggested the use of a sleep
> command, and in some cases it seemed to help (especially when a larger
> number of processors was used).
>
> I haven't yet managed to solve this problem. At the moment, I just try the
> simulation again. My colleagues have not yet identified the source of the
> problem. So I am sending this email, just in case anyone has encountered
> such errors before.
>
> I am attaching some of these errors.
>
> Regards,
> James
>
> _______________________________________________
> RegCNET mailing list
> RegCNET at lists.ictp.it
> https://lists.ictp.it/mailman/listinfo.cgi/regcnet
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ictp.it/pipermail/regcnet/attachments/20130204/710c3976/attachment.htm>


More information about the RegCNET mailing list