[RegCNET] Distributed RegCM runs

Maurice.McHugh at noaa.gov Maurice.McHugh at noaa.gov
Mon Nov 23 16:15:38 CET 2009


Hello again,

I can get RegCM running in parallel on 12 CPUs on 2 different servers, but while the instances of the model are definitely running I do not see any evidence of any output, nor do short model runs actually end.  I think what I am seeing is a requirement for a shared filesystem.

The questions I have are:
* Do all CPUs have to be able to communicate with each other while Running RegCM?

* How does RegCM coordinate output creation from many CPUs; i.e. does the RegCM instance on the master node generate the output, or do all CPUs write to the same files? 

* Another issue that puzzles me is that when I run RegCM in parallel on a single server I see output to the screen from RegCM, I do not see this when distributing processing onto a second server in addition to the master server.

I have successfully run RegCM on several dozen CPUs before, but on a single beowolf system using llsubmit etc.  What I am trying to do here is to distribute processing across dozens of CPUs on different servers that do not have a shared filesystem - so CPUs on one server cannot write to files on another server, not can the CPUs intercommunicate.

I would really appreciate it if anyone can shed some light on the mechanics of how RegCMs parallelization works.

Thank you!

Maurice
  

> On Thu, 19 Nov 2009, Maurice.McHugh at noaa.gov wrote:
> 
> > Do you happen to know if RegCM can work in parallel across multiple
> > servers without a shared filesystem  like GFS or NFS?
> >
> 
> Yes, actually the requirement to run parallel mode of RegCM is very less,
> just MPICH (or openmpi) installation.
> 
> If you have several computer or several multiple servers which connected
> by fast network, and each of them has MPICH installed, then it's also 
> 
> fine. In the previous RegCM workshop, I have seen one student run RegCM
> in parallel as this way, he prepare one file which contains the computer
> names he plan to use.
> 
> Honestly, the clusters and supercompters I used are not that complicated.
> I just simply use mpirun or mpiexec (on Linux cluter by using qsub) 
> and 
> poe (on IBM SP server by using poe, or mpirun, or llsubmit).
> 
> Regards,
> Xunqiang Bi
> _______________________________________________
> RegCNET mailing list
> RegCNET at lists.ictp.it
> 



More information about the RegCNET mailing list