siaodani Hi, thanks for your reply.
I've test the reduAlgo values. The calc.solver.eig.set_algo("elpa")
and calc.solver.eig.set_reduAlgo(3)
requires the number of process rows divides the number of process columns, while the number of process rows and columns are related to the grids and can't be set separately.
But we could change the number of process by calc.solver.set_mpi_command("mpiexec -n 121 --oversubscribe")
, namely the the square of 11, or some other numbers like 324, 400, and the number of process could be set a little more than the physical cores applied. The calc.solver.eig.set_reduAlgo(3)
has fastest computational speed for the large system.