How can I make a homemade supercomputer

Chinese supercomputer Tianhe 2 doubles computing power with new accelerator chip

At the International HPC Forum (IHPCF) in Guangzhou it was possible to find out that the Supercomputer Center from the National University of Defense Technology (NUDT) located there was using the old cards with Xeon-Phi-Knights-Corner (KNC) in Tianhe (Milky Way) 2 was thrown out and replaced and expanded by a Matrix 2000 GPDSP accelerator developed in China.

2.2 TFlops at 128 threads

The exchange is probably not yet complete, but it will certainly make it to the next Top500 list, which is to be published in November at the SC17 in Denver. Most of the information that is known about the system so far comes from the Tokyo professor Satoshi Matsouka, who was allowed to tour the facility together with Linpack creator Jack Dongarra and others. Jan Lin from the HPC Center at the University of Shanghai retweeted his tweets in English.

The rumor mill says that like the KNC, the huge Matrix 2000 chips could be 128 in-order ARM chips, each with two 256-bit vector extensions. With the double-precision matrix multiplication DGEMM, the Matrix 2000 with 1.2 GHz should come to 2.2 TFlops with 128 threads with 90.2 percent efficiency and 240 watts TDP. The old Xeon Phi 31S1P (with 57 cores) only achieved about 840 GFlops in the DGEMM. On the other hand, its memory bandwidth of up to 160 GByte / s with Stream (Triad) was a lot higher, that for Matrix 2000 is estimated at 96 GByte / s with 128 cores.

Theoretically up to 5.34 TFlops

As before, a node consists of two Xeon CPUs, but now expanded by two Matrix 2000 accelerators and equipped with 192 GB of memory. The theoretical peak performance of a node is a total of 5.34 TFlops, of which the two Xeons contribute 430 GFlops - that speaks in favor of keeping the old Xeon 2692 processors with 12 cores and 2.2 GHz clock rate.

However, the number of nodes will be increased from 16,000 to over 17,792. This means that the complete Tianhe 2A computer then achieves 94.7 PFlops total peak performance, compared to 54.9 PFlops previously. Thanks to the proprietary network with now 112 Gbit / s, the Linpack performance of the overall system should also be over 90 percent and thus probably reach over 80 PFlops - the predecessor system Tianhe 2, which led the Top500 list for three years until 2016, only came to 33.9 PFlops.

The total energy consumption is still around 18 MW, the energy efficiency has increased from 1.9 GFlops / W to more than 5 GFlops / W. (as)

Read comments (63) Go to homepage
Ad ad