To determine the cluster performance in floating-point calculations, the standard test LINPACK in its paralleled shared memory architecture HPL is used. The same test is used to evaluate the performance of supercomputers from the Top500 list.
Performance is measured based on two parameters of the computing machine:
- theoretical performance
Rpeak
– this is the productivity calculated as the product of the number of processor cores at their maximum theoretical performance; - real performance
Rmax
– this is the actual performance, estimated as a total number of flops executed per unit of time by all the cores of the computer.
Because of the difference in processor performance that is used in the Physon system, the computational nodes are divided into several groups:
small subsystem – 24 cores (6 Xeon E5335 processors)
large subsystem – 160 cores (40 Xeon 5420 processors)
new sub-system – 24 cores (4 Xeon E5-2620 processors)
newer subsystem – 24 cores (2 Xeon E5-2650 v4 processors)
Graphics node – 1024 cores (2 Tesla M2090 processors)
Physon performance in Gflops:
component | number of cores | Rpeak | Rmax | Ns | efficiency |
---|---|---|---|---|---|
interactive node | 8 | 64 | 51.4 | 37504 | 80.3% |
small subsystem | 24 | 192 | 154.8 | 64768 | 80.6% |
large subsystem | 160 | 1600 | 1302 | 192000 | 81.4% |
new sub-system | 24 | 384 | 381 | 120000 | 99.2 % |
newer subsystem | 24 | 729.6 | — | — | — |
graphics node | 1024 | 1332 | 518 | 30000 | 39 % |
All | 240 | 4301.6 | 3221 | — | — |
The test used:
- Compiler: Intel C / C ++ Compiler (EM64T), v10.0.23
- BLAS : Intel Math Kernel Library (EM64T), v10.0.1.014
- MPI : OpenMPI, v1.2.1
Actual performance exceeds 80% of the theoretical, which is consistent with the same ratio for other InfiniBand cluster systems.