To determine the cluster performance in floating-point calculations, the standard test LINPACK in its paralleled shared memory architecture HPL is used. The same test is used to evaluate the performance of supercomputers from the Top500 list.

Performance is measured based on two parameters of the computing machine:

  • theoretical performance Rpeak – this is the productivity calculated as the product of the number of processor cores at their maximum theoretical performance;
  • real performance Rmax – this is the actual performance, estimated as a total number of flops executed per unit of time by all the cores of the computer.

Because of the difference in processor performance that is used in the Physon system, the computational nodes are divided into several groups:

small subsystem – 24 cores (6 Xeon E5335 processors)

large subsystem – 160 cores (40 Xeon 5420 processors)

new sub-system – 24 cores (4 Xeon E5-2620 processors)

newer subsystem – 24 cores (2 Xeon E5-2650 v4 processors)

Graphics node – 1024 cores (2 Tesla M2090 processors)

Physon performance in Gflops:

component number of cores Rpeak Rmax Ns efficiency
interactive node 8 64 51.4 37504 80.3%
small subsystem 24 192 154.8 64768 80.6%
large subsystem 160 1600 1302 192000 81.4%
new sub-system 24 384 381 120000 99.2 %
newer subsystem 24 729.6
graphics node 1024 1332 518 30000 39 %
All 240 4301.6 3221

The test used:

  • Compiler: Intel C / C ++ Compiler (EM64T), v10.0.23
  • BLAS : Intel Math Kernel Library (EM64T), v10.0.1.014
  • MPI : OpenMPI, v1.2.1

Actual performance exceeds 80% of the theoretical, which is consistent with the same ratio for other InfiniBand cluster systems.