[RegCNET] mpirun error with pgi 7.1 and AMD Opteron x86-64

Travis O'Brien tobrien at ucsc.edu
Thu Sep 11 16:24:10 CEST 2008


Hello Paulo,

You might be able to get past this model crash by reducing the model 
timestep (dt) in regcm.in.  There are guidelines for how to change dt 
(and abatm, which should be changed at the same time) written in the 
RegCM User's Manual.  At this point, you could either restart the model 
from the beginning, or you could possibly restart the model from the 
most recent SAV file in the output directory.

I hope that this helps!

-Travis O'Brien-

Paulo Ricardo TEIXEIRA-SILVA wrote:
>
>
>     Hello,
>
>     Whilenusing NPROC=16 in a AMD Opteron dual core x86-64, using PGI
>     compiler 7.1, I'm with problems to run regcm
>     my grid is with iy=232 and ix=160, ds=40.0
>
>      *when I types:*
>     nohup mpirun -np 16 -machinefile machine.cluster  -nolocal regcm >
>     log.out.txt &
>
>     *but no full sucess.
>     the regcm run the 56 first days, exit with error mesg ->*
>     ...
>      BATS variables written at    2005022509    180.0000000000000    
>          at day =   55.4156, ktau =      53200 :  1st, 2nd time deriv
>     of ps =  0.13552E-04 0.93734E-07,  no. of points w/convection = 
>     process            3 of           16
>     p3_26190:  p4_error: interrupt SIGFPE: 8
>      process            8 of           16
>      process            4 of           16
>     p4_23892:  p4_error: interrupt SIGx: 13
>      process           12 of           16
>      process            2 of           16
>     p2_25778:  p4_error: net_recv read:  probable EOF on socket: 1
>      process           10 of           16
>      process           14 of           16
>     rm_l_2_25789: (36773.972656) net_send: could not write to fd=5,
>     errno = 32
>      process            6 of           16
>      process            9 of           16
>     p9_30291:  p4_error: net_recv read:  probable EOF on socket: 1
>      process           13 of           16
>     rm_l_9_30302: (36773.339844) net_send: could not write to fd=5,
>     errno = 32
>      process            5 of           16
>      process            1 of           16
>
>     ...
>      Writing rad fields at ktau =         53760   2005022600
>      SAVTMP RESTART WRITTEN: idatex=   2005022600 ktau=        53760
>      /bin/rm -f                                 
>     SAVTMP.2005022400        
>      BCs are ready from    2005022600   to    2005022606
>     rm_l_3_26205: (36773.968750) net_send: could not write to fd=5,
>     errno = 32
>      process           11 of           16
>      process            7 of           16
>      process           15 of           16
>     p4_23892: (36789.746094) net_send: could not write to fd=5, errno = 32
>     p2_25778: (36797.988281) net_send: could not write to fd=5, errno = 32
>     p9_30291: (36799.359375) net_send: could not write to fd=5, errno = 32
>
>     *
>     my regcm.param2 as:*
>
>           INTEGER IX
>           INTEGER NPROC
>           INTEGER MJX
>           INTEGER KX
>           INTEGER NSG
>           INTEGER NNSG
>           INTEGER IBYTE
>           INTEGER JXP
>           CHARACTER*5 DATTYP
>           CHARACTER*4 LSMTYP
>           CHARACTER*7 AERTYP
>           integer jxbb
>           parameter(IX     =   232)
>           parameter(NPROC  =    16)
>           parameter(MJX    =   160)
>           parameter(JXP    = MJX/NPROC)
>           parameter(KX     =    18)
>           parameter(NSG    =     1)
>           parameter(NNSG   =     1)
>           parameter(IBYTE  =     4)
>           parameter(DATTYP='NNRP1')
>           parameter(LSMTYP='BATS')
>           parameter(AERTYP='AER00D0')
>           parameter(jxbb=mjx-1)
>     ~
>
>
>
>     Can somebody suggests me how to overcome this, please!
>
>
>     PS.: The problem can be ,  no. of points w/convection
>     Regards,
>
>     Paulo Ricardo Teixeira
>
>     #########################################################################
>
>     CV (Currículo Lattes):
>     http://buscatextual.cnpq.br/buscatextual/visualizacv.jsp?id=K4705902T0
>     ou neste link: http://lattes.cnpq.br/8914320939610393
>     Paulo Ricardo Teixeira da Silva
>     Diretor Adjunto de Assuntos Acadêmicos e Científico da UNEMET
>     Mestre em Meteorologia - Radiação Solar / Modelagem da Radiação Solar
>     (Processos de Superfície Terrestre)
>
>     Bolsista/Pesquisador do NMA/LBA/INPA
>     Instituto Nacional de Pesquisas da Amazônia - INPA
>     Fone: +55 92 3643-3623
>     Fax: +55 92 3643 3625
>     Av. André Araújo, 2936 - Campus II
>     Bairro: Aleixo - Cx. Postal 478 / Cep 69060-001
>     Manaus/Amazonas
>
>
>     Linux Counter desde de 2001-11-22
>     N_LinuxCounter : #246599
>
>     #########################################################################
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> RegCNET mailing list
> RegCNET at lists.ictp.it
> https://lists.ictp.it/mailman/listinfo/regcnet

-- 
Travis A. O'Brien
Graduate Student Researcher
Earth and Planetary Science Dept.
UC Santa Cruz

tobrien at ucsc.edu
(831) 459-3504




More information about the RegCNET mailing list