Dear All,
 
I have designed the linux cluster of 16 nodes and have done some test experiments with RegCM3 on this cluster. I have got problem when i go beyond 6 nodes. Uptill 6 nodes of cluster RegCM3 is working perfectly in all aspects, i have visualized the output in GrADS as well. But beyound six nodes the model do not produce correct results and do not crash also. I have set MJX=120 for south asia domain and accordingly on NPROC 1,2,3,4,5,6,8,10,12,15 the model should work at least. But i have find that the RegCM3 is producing erroneous results with NPROC 8,10,12,15.
 
 
[root@gcisc01 ~]# grep "0.0851" *.log

Run_01-NPROC_01-Node.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv of ps = 0.57540E-04 0.21880E-06, no. of points w/convection = 188

Run_02-NPROC_02-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv of ps = 0.57540E-04 0.21880E-06, no. of points w/convection = 188

Run_03-NPROC_03-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv of ps = 0.57540E-04 0.21880E-06, no. of points w/convection = 188

Run_04-NPROC_04-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv of ps = 0.57540E-04 0.21880E-06, no. of points w/convection = 188

Run_05-NPROC_05-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv of ps = 0.57540E-04 0.21880E-06, no. of points w/convection = 188

Run_06-NPROC_06-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv of ps = 0.57540E-04 0.21880E-06, no. of points w/convection = 188

Run_08-NPROC_08-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv of ps = 0.49356E-04 0.74337E-04, no. of points w/convection = 188

Run_10-NPROC_10-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv of ps = 0.90169E-01 0.82160E-05, no. of points w/convection = 188

Run_12-NPROC_12-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv of ps = 0.59345E-01 0.79605E-04, no. of points w/convection = 188

Run_15-NPROC_15-Nodes.log: at day = 0.0851, ktau = 50 : 1st, 2nd time deriv of ps = 0.15888+268 0.45244+264, no. of points w/convection = 188

If anyone have successfully run the model on 64-bit linux cluster with more than 6 nodes, please advice me. Moreover, i have tested the simple mpi programs on this cluster and these are working well. The systems which have been used for the cluster have the following specifications.
 
Processor             Intel Core 2 Duo E6850, 3.0GHz
Mother Board       Intel Desktop Board DQ35JO

L1 D-Cache         32 KB
L1 I-Cache          32 KB
L2 Cache            4096 KB = 4 MB

Head Node Main Memory            2 GB          
Worker Node Main Memory         1 GB

Compilers   Intel Fortran Compiler 10.0, Intel C++ Compiler 10.0, NetCDF 3.6.2, MPICH2-1.0.6p1 and ncl_ncarg-5.0.0

 
With best regards,
Asif

--
Muhammad Asif
Scientific Officer
Global Change Impact Studies Center (GCISC)
1st Floor, Saudi-Pak Tower
61-A Jinnah Avenue, Blue Area
Islamabad Pakistan
Tel: +92-51-9219785, +92-51-2800271
Fax: +92-51-9219787