[RegCNET] RegCM3 on 16-nodes Linux Cluster

Muhammad Asif asifsamiarain at gmail.com
Wed Jul 30 12:17:42 CEST 2008


Dear All,

I have designed the linux cluster of 16 nodes and have done some test
experiments with RegCM3 on this cluster. I have got problem when i go beyond
6 nodes. Uptill 6 nodes of cluster RegCM3 is working perfectly in all
aspects, i have visualized the output in GrADS as well. But beyound six
nodes the model do not produce correct results and do not crash also. I have
set MJX=120 for south asia domain and accordingly on NPROC
1,2,3,4,5,6,8,10,12,15 the model should work at least. But i have find that
the RegCM3 is producing erroneous results with NPROC 8,10,12,15.


[root at gcisc01 ~]# grep "0.0851" *.log

Run_01-NPROC_01-Node.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv
of ps = 0.57540E-04 0.21880E-06, no. of points w/convection = 188

Run_02-NPROC_02-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv
of ps = 0.57540E-04 0.21880E-06, no. of points w/convection = 188

Run_03-NPROC_03-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv
of ps = 0.57540E-04 0.21880E-06, no. of points w/convection = 188

Run_04-NPROC_04-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv
of ps = 0.57540E-04 0.21880E-06, no. of points w/convection = 188

Run_05-NPROC_05-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv
of ps = 0.57540E-04 0.21880E-06, no. of points w/convection = 188

Run_06-NPROC_06-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv
of ps = 0.57540E-04 0.21880E-06, no. of points w/convection = 188

Run_08-NPROC_08-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv
of ps = 0.49356E-04 0.74337E-04, no. of points w/convection = 188

Run_10-NPROC_10-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv
of ps = 0.90169E-01 0.82160E-05, no. of points w/convection = 188

Run_12-NPROC_12-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv
of ps = 0.59345E-01 0.79605E-04, no. of points w/convection = 188

Run_15-NPROC_15-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv
of ps = 0.15888+268 0.45244+264, no. of points w/convection = 188
If anyone have successfully run the model on 64-bit linux cluster with more
than 6 nodes, please advice me. Moreover, i have tested the simple mpi
programs on this cluster and these are working well. The systems which have
been used for the cluster have the following specifications.

*Processor*             Intel Core 2 Duo E6850, 3.0GHz
*Mother Board*       Intel Desktop Board DQ35JO

*L1 D-Cache*         32 KB
*L1 I-Cache*          32 KB
*L2 Cache*            4096 KB = 4 MB

*Head Node Main Memory*            2 GB
*Worker Node Main Memory*         1 GB

*Operating System*          Scientific Linux 4.6 (x86_64)

*Compilers*   Intel Fortran Compiler 10.0, Intel C++ Compiler 10.0, NetCDF
3.6.2, MPICH2-1.0.6p1 and ncl_ncarg-5.0.0

With best regards,
Asif


-- 
Muhammad Asif
Scientific Officer
Global Change Impact Studies Center (GCISC)
1st Floor, Saudi-Pak Tower
61-A Jinnah Avenue, Blue Area
Islamabad Pakistan
Tel: +92-51-9219785, +92-51-2800271
Fax: +92-51-9219787
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ictp.it/pipermail/regcnet/attachments/20080730/d7a0beab/attachment-0002.html>


More information about the RegCNET mailing list