[RegCNET] Use SAV (not SAVTMP) file to restart the parallel experiment
XUNQIANG BI
bixq at ictp.it
Tue Feb 3 16:43:24 CET 2009
Hi, Simon:
Use SAV (not SAVTMP) file to restart the parallel experiment.
On our cluster, I have checked, for parallel codes, the restart
file under Run/output (e.g. SAV.1990010100) can be used fine for
restart, but not the SAVTMP.199001030100 (or other SAVTMP files).
When I go to check the code, I found the MPI_GATHER has not been
called before output SAVTMP files. This is the reason that
SAVTMP files don't work for restart experiments. I haven't fixed
quickly because if we call MPI_GATHER too frequently for SAVTMP,
the parallel efficiency would reduced.
The solution I plan to do is not write out the SAVTMP file for
parallel experiment. Of course, the other solution is set
savfrq = 4800. (in regcm.in file).
But I don't think my above explain is suitable for your problem. Yours
seems to be just the computational instability (if you are using SAV file,
not SAVTMP file for restart).
I would suggest you run a continous experiment to 1960030100 to check
the computational instability.
Good luck !
Xunqiang
On Tue, 3 Feb 2009, Simon Krichak wrote:
> Thank you, Xunqiang,
>
> I seem to be having problems with restarting a model run using SAV files
> created in a parallel run using 8 processors on a Linux cluster system
> running CentOS release 4.5.
> The cluster consists of a single head node (power), and 12 compute nodes with
> 16GB of memory
> and 8 cores XEON 2.66GHz each.
> At the same time I experience no problems with restarting parallel runs with
> SAV files created
> using 4 processors.
>
> This apparently means that the SAV files differ somehow. Could you please
> give me some hints
> on what to do?
>
> Simon
>
>
>> > Plese see the outprint below.
>> >
>> >
>> > dt, dtau = 120.000000000000 15.0000000000000
>> > 30.0000000000000
>> > 0 linearization about standard atmosphere (lstand=.t.)
>> > 0sigmaf 0.00 5.000E-02 0.100 0.160 0.230
>> > 0.310 0.
>> > 390 0.470 0.550 0.630 0.710
>> > 0.780 0.840 0.890 0.930 0.960
>> > 0.980 0.9
>> > 90 1.00
>> > 0t mean 218. 218. 218. 222. 233.
>> > 242. 2
>> > 50. 256. 263. 268. 273.
>> > 277. 280. 283. 285. 286.
>> > 287. 28
>> > 7.
>> > 0ps mean 100.ps mean 100.
>> > 0 vertical mode problem completed for kx= 18 0 errors detected
>> > (should be
>> > 0)
>> > m, fac = 4 13.3333333333333
>> > m, fac = 2 24.0000000000000
>> > Writing output files in direct access format
>> >
>> > ******* OPENING NEW OUTPUT FILES: 1960020100
>> > OPENING NEW OUT FILE: output/ATM.1960020100
>> > OPENING NEW BAT FILE: output/SRF.1960020100
>> > at day = 31.0063, ktau = 44650 : 1st, 2nd time deriv of
>> > ps = NaN
>> > NaN , no. of points w/convection = 3
>> > at day = 31.0410, ktau = 44700 : 1st, 2nd time deriv of
>> > ps = NaN
>> > NaN , no. of points w/convection = 0
>> > at day = 31.0757, ktau = 44750 : 1st, 2nd time deriv of
>> > ps = NaN
>> > NaN , no. of points w/convection = 0
>> > at day = 31.1104, ktau = 44800 : 1st, 2nd time deriv of
>> > ps = NaN
>> > ------------------------------------------------------------------------
>> >
>> > _______________________________________________
>> > RegCNET mailing list
>> > RegCNET at lists.ictp.it
>> > https://lists.ictp.it/mailman/listinfo/regcnet
>>
>
>
> Simon
>
> ----- Original Message ----- From: "XUNQIANG BI" <bixq at ictp.it>
> To: "Simon Krichak" <shimon at cyclone.tau.ac.il>
> Cc: <regcnet at ictp.it>
> Sent: Tuesday, February 03, 2009 4:35 PM
> Subject: Re: ctl file for SAV files
>
>
>> On Tue, 3 Feb 2009, Simon Krichak wrote:
>>
>> > Dear Dr. Bi,
>> >
>> > Could you please let me know how can I constrict a ctl file to be able
>> > to see
>> > the
>> > SAV.xxxxx files.
>>
>> No, SAV files are not in GrADS format. It is just certain binary file
>> (Neither sequential format, nor direct-access format).
>>
>> If you really want to see the fileds in SAV, I think you have to prepare
>> one converter FORTRAN file.
>>
>> Regards,
>> Xunqiang
>
>
> --------------------------------------------------------------------------------
>
>
>
> No virus found in this incoming message.
> Checked by AVG i www.avg.com
> Version: 8.0.233 / Virus Database: 270.10.17/1931 i Release Date: 01/30/09
> 17:31:00
>
>
>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dr. Xunqiang Bi email:bixq at ictp.it
Earth System Physics Group
The Abdus Salam ICTP
Strada Costiera, 11
P.O. BOX 586, 34100 Trieste, ITALY
Tel: +39-040-2240302 Fax: +39-040-2240449
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
More information about the RegCNET
mailing list