[RegCNET] Distributed RegCM runs

bixq bixq at ictp.it
Mon Nov 23 16:41:12 CET 2009


Dear Maurice:

I should thank you for your efforts firstly.

On Mon, 23 Nov 2009, Maurice.McHugh at noaa.gov wrote:

> Hello again,
>
> I can get RegCM running in parallel on 12 CPUs on 2 different servers, but while the instances of the model are
> definitely running I do not see any evidence of any output, nor do short 
> model runs actually end.  I think what I am seeing is a requirement for 
> a shared filesystem.
>
> The questions I have are:
> * Do all CPUs have to be able to communicate with each other while Running RegCM?

Yes, if you array those CPUs but numbers, then #1 will communicate with 
#2, #2 to #3, ..., and also #2 to #1, #3 to #2 in the opposite direction. 
>
> * How does RegCM coordinate output creation from many CPUs; i.e. does the RegCM instance on the master
>  node generate the output, or do all CPUs write to the same files?

Only the master node (#0) reads/writes output from/to harddisk. All the 
terminar message written from master node.
>
> * Another issue that puzzles me is that when I run RegCM in parallel on a single server I see output to the
> screen from RegCM, I do not see this when distributing processing onto a 
> second server in addition to the master server.

Does the simple code (I'll find the link for you later) works ?
>
> I have successfully run RegCM on several dozen CPUs before, but on a single beowolf system using llsubmit etc.
>  What I am trying to do here is to distribute processing across dozens 
> of CPUs on different servers that do not have a shared filesystem - so 
> CPUs on one server cannot write to files on another server, not can the 
> CPUs intercommunicate.

I have no experience about this.
>
> I would really appreciate it if anyone can shed some light on the mechanics of how RegCMs parallelization works.

By the way, the present domain decomposition is just along one dimension. 
If someone is interested on the 2D domain decomposition, we would be 
appreciate !

Best regards !
Xunqiang
  >
> Thank you!
>
> Maurice
>
>
>> On Thu, 19 Nov 2009, Maurice.McHugh at noaa.gov wrote:
>>
>>> Do you happen to know if RegCM can work in parallel across multiple
>>> servers without a shared filesystem  like GFS or NFS?
>>>
>>
>> Yes, actually the requirement to run parallel mode of RegCM is very less,
>> just MPICH (or openmpi) installation.
>>
>> If you have several computer or several multiple servers which connected
>> by fast network, and each of them has MPICH installed, then it's also
>>
>> fine. In the previous RegCM workshop, I have seen one student run RegCM
>> in parallel as this way, he prepare one file which contains the computer
>> names he plan to use.
>>
>> Honestly, the clusters and supercompters I used are not that complicated.
>> I just simply use mpirun or mpiexec (on Linux cluter by using qsub)
>> and
>> poe (on IBM SP server by using poe, or mpirun, or llsubmit).
>>
>> Regards,
>> Xunqiang Bi
>> _______________________________________________
>> RegCNET mailing list
>> RegCNET at lists.ictp.it
>>
>

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   Dr. Xunqiang Bi         email:bixq at ictp.it
   Earth System Physics Group
   The Abdus Salam ICTP
   Strada Costiera, 11
   P.O. BOX 586, 34100 Trieste, ITALY
   Tel: +39-040-2240302  Fax: +39-040-2240449
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



More information about the RegCNET mailing list