[RegCNET] RegCM3 on 16-nodes Cluster
Muhammad Asif
asifsamiarain at gmail.com
Wed Jul 30 09:01:59 CEST 2008
Dear All,
I have designed the linux cluster of 16 nodes and have done some test
experiments with RegCM3 on this cluster. I have got problem when i go beyond
6 nodes. Uptill 6 nodes of cluster RegCM3 is working perfectly in all
aspects, i have visualized the output in GrADS as well. But beyound six
nodes the model do not produce correct results and do not crash also. I have
set MJX=120 for south asia domain and accordingly on NPROC
1,2,3,4,5,6,8,10,12,15 the model should work at least. But i have find that
the RegCM3 is producing erroneous results with NPROC 8,10,12,15.
[root at gcisc01 ~]# grep "0.0851" *.log
Run_01-NPROC_01-Node.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv
of ps = 0.57540E-04 0.21880E-06, no. of points w/convection = 188
Run_02-NPROC_02-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv
of ps = 0.57540E-04 0.21880E-06, no. of points w/convection = 188
Run_03-NPROC_03-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv
of ps = 0.57540E-04 0.21880E-06, no. of points w/convection = 188
Run_04-NPROC_04-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv
of ps = 0.57540E-04 0.21880E-06, no. of points w/convection = 188
Run_05-NPROC_05-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv
of ps = 0.57540E-04 0.21880E-06, no. of points w/convection = 188
Run_06-NPROC_06-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv
of ps = 0.57540E-04 0.21880E-06, no. of points w/convection = 188
Run_08-NPROC_08-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv
of ps = 0.49356E-04 0.74337E-04, no. of points w/convection = 188
Run_10-NPROC_10-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv
of ps = 0.90169E-01 0.82160E-05, no. of points w/convection = 188
Run_12-NPROC_12-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv
of ps = 0.59345E-01 0.79605E-04, no. of points w/convection = 188
Run_15-NPROC_15-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv
of ps = 0.15888+268 0.45244+264, no. of points w/convection = 188
If anyone have successfully run the model on 64-bit linux cluster with more
than 6 nodes, please advice me. Moreover, i have tested the simple mpi
programs on this cluster and these are working well. The systems which have
been used for the cluster have the following specifications.
*Processor* Intel Core 2 Duo E6850, 3.0GHz
*Mother Board* Intel Desktop Board DQ35JO
*L1 D-Cache* 32 KB
*L1 I-Cache* 32 KB
*L2 Cache* 4096 KB = 4 MB
*Head Node Main Memory* 2 GB
*Worker Node Main Memory* 1 GB
*Compilers* Intel Fortran Compiler 10.0, Intel C++ Compiler 10.0, NetCDF
3.6.2, MPICH2-1.0.6p1 and ncl_ncarg-5.0.0
With best regards,
Asif
--
Muhammad Asif
Scientific Officer
Global Change Impact Studies Center (GCISC)
1st Floor, Saudi-Pak Tower
61-A Jinnah Avenue, Blue Area
Islamabad Pakistan
Tel: +92-51-9219785, +92-51-2800271
Fax: +92-51-9219787
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ictp.it/pipermail/regcnet/attachments/20080730/7eafb771/attachment-0002.html>
More information about the RegCNET
mailing list