Good morning and happy new year!
I get the following error when I restart the model after 3 months. I am using time step 36 sec for non-hydro with the CLM45 option.
I started the initial run at mdate0=mdate1=2005050100 and stopped at mdate2=2005080100 with ifest=.false.,
Now, I have mdate0= 2005050100, mdate1=2005080100, and mdate2=2005120100 with ifrest=.true.,
Please help me post this question on RegCM portal.
At 2005-08-01 00:00:00 UTC
JDay 212.00000 solar declination angle = 18.31034409 degrees
solar TSI irradiance = 1361.0705 W/m^2
********************* MASS CHECK ********************
At 2005-08-01 00:00:00 UTC
Total dry air = 0.39982E+17 kg, error = -0.00000 %
Total water = 0.87682E+14 kg, error = -0.00003 %
Mean values over past 24 hours :
Dry air boundary = 0.43028E+09 kg.
Water boundary = -0.57755E+08 kg.
Convective rain = 0.24657E+07 kg.
Nonconvective rain = 0.93299E+04 kg.
Ground Evaporation = 0.31904E+07 kg.
*****************************************************
At 2005-08-01 00:00:00 UTC: updating upper radiative BC coefficients
$$$ 2005-08-01 00:00:00 UTC
$$$ max value of CFL = 0.40613E-01
$$$ no. of points with active convection = 5335
Attempting to read monthly vegetation data .....
At 2005-08-01 00:09:00 UTC
Month = 8 Day = 1
Successfully read monthly vegetation data for
month 8
[c14-8:172256:0:172256] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
==== backtrace ====
0 0x000000000004f11c ucs_rcache_get() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel7-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.2.0-gcc-MLNX_OFED_LINUX-4.3-3.0.2.1-redhat7.5-x86_64/ucx-v1.4.x/src/ucs/sys/rcache.c:582
1 0x0000000000027044 uct_ib_mem_rcache_reg() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel7-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.2.0-gcc-MLNX_OFED_LINUX-4.3-3.0.2.1-redhat7.5-x86_64/ucx-v1.4.x/src/uct/ib/base/ib_md.c:911
2 0x00000000000120e2 ucp_mem_rereg_mds() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel7-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.2.0-gcc-MLNX_OFED_LINUX-4.3-3.0.2.1-redhat7.5-x86_64/ucx-v1.4.x/src/ucp/core/ucp_mm.c:99
3 0x0000000000013bc6 ucp_request_memory_reg() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel7-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.2.0-gcc-MLNX_OFED_LINUX-4.3-3.0.2.1-redhat7.5-x86_64/ucx-v1.4.x/src/ucp/core/ucp_request.c:215
4 0x0000000000013e78 ucp_request_send_buffer_reg() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel7-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.2.0-gcc-MLNX_OFED_LINUX-4.3-3.0.2.1-redhat7.5-x86_64/ucx-v1.4.x/src/ucp/core/ucp_request.inl:343
5 0x00000000000264e7 ucp_tag_send_req() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel7-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.2.0-gcc-MLNX_OFED_LINUX-4.3-3.0.2.1-redhat7.5-x86_64/ucx-v1.4.x/src/ucp/tag/tag_send.c:72
6 0x00000000000264e7 ucp_tag_send_nb() /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel7-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v2.2.0-gcc-MLNX_OFED_LINUX-4.3-3.0.2.1-redhat7.5-x86_64/ucx-v1.4.x/src/ucp/tag/tag_send.c:201
7 0x00000000000058c6 mca_pml_ucx_common_send() /cluster/work/users/vegarde/build/OpenMPI/2.1.2/GCC-6.4.0-2.28/openmpi-2.1.2/ompi/mca/pml/ucx/pml_ucx.c:647
8 0x000000000005e579 PMPI_Gatherv() /cluster/work/users/vegarde/build/OpenMPI/2.1.2/GCC-6.4.0-2.28/openmpi-2.1.2/ompi/mpi/c/profile/pgatherv.c:191
9 0x000000000004bd9b ompi_gatherv_f() /cluster/work/users/vegarde/build/OpenMPI/2.1.2/GCC-6.4.0-2.28/openmpi-2.1.2/ompi/mpi/fortran/mpif-h/profile/pgatherv_f.c:93
10 0x0000000000a63a07 __mod_mppparam_MOD_linear_to_global_real8_subgrid_subgrid() ???:0
11 0x0000000000802a31 __mod_clm_regcm_MOD_land_to_atmosphere() mod_clm_regcm.F90:0
12 0x00000000004138ac __mod_lm_interface_MOD_surface_model() ???:0
13 0x00000000004eab74 physical_parametrizations.4074() mod_tendency.F90:0
14 0x00000000004f0899 __mod_tendency_MOD_tend() ???:0
15 0x0000000000409be1 __mod_regcm_interface_MOD_rcm_run() ???:0
16 0x000000000040967b main() ???:0
17 0x0000000000022505 __libc_start_main() ???:0
18 0x00000000004096ba _start() ???:0
===================
--------------------------------------------------------------------------
mpirun noticed that process rank 27 with PID 172256 on node c14-8 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
real 0m13.661s
user 1m12.435s
sys 0m2.981s
Job finishd at Mon Jan 4 06:52:39 CET 2021