[RegCNET] strange behaviour in RegCM paralell configuration

XUNQIANG BI bixq at ictp.it
Thu Aug 9 16:40:11 CEST 2007


Hi, Ufuk:

You are using the bewolf cluster, right?

Are you sure the way you did is about the same as

http://www.ictp.trieste.it/~pubregcm/RegCM3/faq/parallel_1.txt


On Thu, 9 Aug 2007, [BE] Ufuk Utku Turuncoglu wrote:

> Hi,
>
> I try to run RegCM in parallel mode but when i submit job to cluster
> wrong number of process will be spawn in each node.
>
> For example, I define the total number of cpu as 24 and i am using 8 cpu
> nodes. so, i am using 3 nodes to create 24 porcess. When i check the
> number of process in each node and count them, it is not exactly 24.
> There are less process that i define in domain.param. I check the all
> configuration again and again and i could not find any problem.
>
> I have already run model successfully in 24 cpu using different input
> data (NCEP). But in this case (using ECHAM data) it is not running. But
> once time i faced same problem with NCEP case but after installing again
> of the model code, it solved and i could not find the bug. Is it
> possible to input data could generate error?

For ECHAM data preprocessing, I guess you add some lines in ICBC.f
(You are not using RegCM3_for_EH5OM_3.tar.gz, right?) ,
Does your ICBC file work with the serial code ?
>
> Also in buggy case, when i submit job, each one of the process runs like
> an single/independent job and writes the information to the regcm.out
> seperately. It means single processor version of RegCM runs in 24 cpu.

I guess that you are not submitting the parallel job in a right way.

The command I used is:

mpirun -np 24 ./regcm

I suggest that after you link the fort.10, fort.101, .... well
to the ICBC files, you'd better use the above line (instead of
regcm.x) to submit parallel job.

Regards,
Xunqiang Bi



More information about the RegCNET mailing list