[RegCNET] Use SAV (not SAVTMP) file to restart the parallel experiment

XUNQIANG BI bixq at ictp.it
Tue Feb 3 16:43:24 CET 2009


Hi, Simon:

Use SAV (not SAVTMP) file to restart the parallel experiment.

On our cluster, I have checked, for parallel codes, the restart
file under Run/output (e.g. SAV.1990010100) can be used fine for
restart, but not the SAVTMP.199001030100 (or other SAVTMP files).

When I go to check the code, I found the MPI_GATHER has not been
called before output SAVTMP files. This is the reason that
SAVTMP files don't work for restart experiments. I haven't fixed
quickly because if we call MPI_GATHER too frequently for SAVTMP,
the parallel efficiency would reduced.

The solution I plan to do is not write out the SAVTMP file for
parallel experiment. Of course, the other solution is set

     savfrq  =  4800.        (in regcm.in file).

But I don't think my above explain is suitable for your problem. Yours
seems to be just the computational instability (if you are using SAV file,
not SAVTMP file for restart).

I would suggest you run a continous experiment to 1960030100 to check
the computational instability.

Good luck !
Xunqiang

On Tue, 3 Feb 2009, Simon Krichak wrote:

> Thank you, Xunqiang,
>
> I seem to be having problems with restarting a model run using SAV files
> created in a parallel run using 8 processors on a Linux cluster system 
> running CentOS release 4.5.
> The cluster consists of a single head node (power), and 12 compute nodes with 
> 16GB of memory
> and 8 cores XEON 2.66GHz each.
> At the same time I experience no problems with restarting parallel runs with 
> SAV files created
> using 4 processors.
>
> This apparently means that the SAV files differ somehow. Could you please 
> give me some hints
> on what to do?
>
> Simon
>
>
>> >  Plese see the outprint below.
>> > 
>> > 
>> >   dt, dtau =    120.000000000000        15.0000000000000
>> >    30.0000000000000
>> >  0 linearization about standard atmosphere (lstand=.t.)
>> >  0sigmaf      0.00      5.000E-02  0.100      0.160      0.230
>> >  0.310      0.
>> >  390      0.470      0.550      0.630      0.710
>> >            0.780      0.840      0.890      0.930      0.960
>> >  0.980      0.9
>> >  90       1.00
>> >  0t mean      218.       218.       218.       222.       233.
>> >  242.       2
>> >  50.       256.       263.       268.       273.
>> >             277.       280.       283.       285.       286.
>> >  287.       28
>> >  7.
>> >  0ps mean     100.ps mean     100.
>> >  0 vertical mode problem completed for kx= 18     0 errors detected
>> >  (should be
>> >  0)
>> >   m, fac =            4   13.3333333333333
>> >   m, fac =            2   24.0000000000000
>> >   Writing output files in direct access format
>> > 
>> >   ******* OPENING NEW OUTPUT FILES:  1960020100
>> >   OPENING NEW OUT FILE: output/ATM.1960020100
>> >   OPENING NEW BAT FILE: output/SRF.1960020100
>> >      at day =   31.0063, ktau =      44650 :  1st, 2nd time deriv of
>> >  ps =  NaN
>> >        NaN        ,  no. of points w/convection =       3
>> >      at day =   31.0410, ktau =      44700 :  1st, 2nd time deriv of
>> >  ps =  NaN
>> >        NaN        ,  no. of points w/convection =       0
>> >      at day =   31.0757, ktau =      44750 :  1st, 2nd time deriv of
>> >  ps =  NaN
>> >        NaN        ,  no. of points w/convection =       0
>> >      at day =   31.1104, ktau =      44800 :  1st, 2nd time deriv of
>> >  ps =  NaN
>> >  ------------------------------------------------------------------------
>> > 
>> >  _______________________________________________
>> >  RegCNET mailing list
>> >  RegCNET at lists.ictp.it
>> >  https://lists.ictp.it/mailman/listinfo/regcnet
>> 
>
>
> Simon
>
> ----- Original Message ----- From: "XUNQIANG BI" <bixq at ictp.it>
> To: "Simon Krichak" <shimon at cyclone.tau.ac.il>
> Cc: <regcnet at ictp.it>
> Sent: Tuesday, February 03, 2009 4:35 PM
> Subject: Re: ctl file for SAV files
>
>
>>  On Tue, 3 Feb 2009, Simon Krichak wrote:
>> 
>> >  Dear Dr. Bi,
>> > 
>> >  Could you please let me know how can I constrict a ctl file to be able 
>> >  to see
>> >  the
>> >  SAV.xxxxx files.
>> 
>>  No, SAV files are not in GrADS format. It is just certain binary file
>>  (Neither sequential format, nor direct-access format).
>> 
>>  If you really want to see the fileds in SAV, I think you have to prepare
>>  one converter FORTRAN file.
>> 
>>  Regards,
>>  Xunqiang
>
>
> --------------------------------------------------------------------------------
>
>
>
> No virus found in this incoming message.
> Checked by AVG i www.avg.com
> Version: 8.0.233 / Virus Database: 270.10.17/1931 i Release Date: 01/30/09 
> 17:31:00
>
>
>

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   Dr. Xunqiang Bi         email:bixq at ictp.it
   Earth System Physics Group
   The Abdus Salam ICTP
   Strada Costiera, 11
   P.O. BOX 586, 34100 Trieste, ITALY
   Tel: +39-040-2240302  Fax: +39-040-2240449
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



More information about the RegCNET mailing list