[RegCNET] Distributed RegCM runs
Maurice.McHugh at noaa.gov
Maurice.McHugh at noaa.gov
Mon Nov 23 16:15:38 CET 2009
Hello again,
I can get RegCM running in parallel on 12 CPUs on 2 different servers, but while the instances of the model are definitely running I do not see any evidence of any output, nor do short model runs actually end. I think what I am seeing is a requirement for a shared filesystem.
The questions I have are:
* Do all CPUs have to be able to communicate with each other while Running RegCM?
* How does RegCM coordinate output creation from many CPUs; i.e. does the RegCM instance on the master node generate the output, or do all CPUs write to the same files?
* Another issue that puzzles me is that when I run RegCM in parallel on a single server I see output to the screen from RegCM, I do not see this when distributing processing onto a second server in addition to the master server.
I have successfully run RegCM on several dozen CPUs before, but on a single beowolf system using llsubmit etc. What I am trying to do here is to distribute processing across dozens of CPUs on different servers that do not have a shared filesystem - so CPUs on one server cannot write to files on another server, not can the CPUs intercommunicate.
I would really appreciate it if anyone can shed some light on the mechanics of how RegCMs parallelization works.
Thank you!
Maurice
> On Thu, 19 Nov 2009, Maurice.McHugh at noaa.gov wrote:
>
> > Do you happen to know if RegCM can work in parallel across multiple
> > servers without a shared filesystem like GFS or NFS?
> >
>
> Yes, actually the requirement to run parallel mode of RegCM is very less,
> just MPICH (or openmpi) installation.
>
> If you have several computer or several multiple servers which connected
> by fast network, and each of them has MPICH installed, then it's also
>
> fine. In the previous RegCM workshop, I have seen one student run RegCM
> in parallel as this way, he prepare one file which contains the computer
> names he plan to use.
>
> Honestly, the clusters and supercompters I used are not that complicated.
> I just simply use mpirun or mpiexec (on Linux cluter by using qsub)
> and
> poe (on IBM SP server by using poe, or mpirun, or llsubmit).
>
> Regards,
> Xunqiang Bi
> _______________________________________________
> RegCNET mailing list
> RegCNET at lists.ictp.it
>
More information about the RegCNET
mailing list