TACC: Engineering Research in HPC

2nd Generation Intel® Xeon® Scalable processors and Intel® Optane™ DC persistent memory speed processing and memory capacity.

Executive Summary
The Texas Advanced Computing Center (TACC) continuously re-invents supercomputing at larger and larger scale to enable breakthrough research and deliver the resources that scientists need. Frontera, a 38.75 petaFLOPS cluster, that earned the #5 ranking on the June 2019 Top500 list,1 is its latest supercomputing system comprising nearly a half-million cores of 2nd Generation Intel® Xeon® Scalable processors inside Dell EMC PowerEdge* servers.

Challenge
The Texas Advanced Computing Center (TACC) is a world-renowned facility for supercomputing, enabling new discoveries across a range of disciplines in science and industry.

“Our mission here at the Texas Advanced Computing Center,” said TACC’s Executive Director, Dr. Dan Stanzione, “is to provide groundbreaking new computing capabilities to enable new kinds of scientific discoveries, and new kinds of engineering research.”

Deployed in 2017, TACC’s Stampede2 supercomputer incorporated the latest Intel® Xeon® Scalable processors inside Dell EMC PowerEdge* servers and including Intel® Omni-Path Architecture fabric. Designed as a capability machine, Stampede2 will support three to four thousand projects over its lifetime. But, every few years, TACC looks at the kinds of problems that researchers are tackling and what types of architecture will offer the best support for that science. Some of those problems address the ‘grand challenges’ of our time and require computing on a massive scale.

“We’re looking at control problems around fusion reactors,” commented Stanzione as he offered an example of the kinds of massive scale research that will require new levels of supercomputing performance. “We’re looking at mantle convection as a whole Earth problem, where you see single simulations across the entire planet.”

Such a scale of problems requires a different scale of supercomputer than Stampede2.

Frontera hardware and software system overview.

Solution
Frontera is TACC’s newest supercomputer, supported by a $60 million award from the U.S. National Science Foundation. It contains a large main system that will deliver peak performance of 38.71 petaFLOPS, according to Stanzione. The main system is built on the 2nd Gen Intel® Xeon® Platinum processor with 8,008 dual-socket nodes of 56 cores per node, interconnected by InfiniBand* Architecture at 100 Gbps. Its 448,448 cores give TACC more computing capacity and memory capacity than the center has had in the past.

By selecting Intel’s latest server processor, frontera offers:

  • A higher clock rate than previous systems, delivering higher single-thread performance
  • More processor cores to run more threads at the same time
  • More memory bandwidth that can feed data to all those cores

“Frontera will address a narrower mission than Stampede2,” explained Stanzione. “Instead of supporting thousands of projects, we’ll have a few hundred that have an extraordinary computational need and massive scale of computation. It’ll solve the very biggest sort of grand challenge projects in the scientific ecosystem. We’ll be running calculations at a speed and at a scale that we’ve never been able to do before.”

Frontera will also support new technologies previously unavailable, including Intel® Deep Learning Boost (Intel® DL Boost) targeted for artificial intelligence workloads. These new technologies will help TACC supercomputer designers understand better which of these are useful to researchers, so the technologies can be integrated into the next next-generation TACC machine slated for 2025. One such technology is Intel® Optane™ DC persistent memory.

“Intel® Optane™ DC persistent memory,” commented Stanzione, “has several unique characteristics for us that offer advantages over traditional memory and advantages over traditional storage. There are many potential interesting use cases, such as very, very large memory nodes—multiple terabytes per node—or simple fault tolerance. When a server fails, we can keep the state of memory and allow the computation to keep running, versus having to restart it across the whole 8,008 nodes that make up the machine.”

“Intel® Optane™ DC persistent memory has several unique characteristics for us that offer advantages over traditional memory and advantages over traditional storage." —TACC’s Executive Director, Dr. Dan Stanzione

Result
Grand challenge problems need massive computing capacity.

“It’s going to be a remarkably productive system,” said Stanzione. “We think, in terms of real science throughput, we’ll get three or four times the performance of its predecessor.”

Beyond the Standard Model
With the discovery of the Higgs boson using the Large Hadron Collider (LHC) at CERN in Geneva, Switzerland, the final piece of the Standard Model of Physics was put in place. Now, scientists around the world are looking Beyond the Standard Model to gain a finer sense of what makes up high-energy particle physics. The LHC, with one of its detectors called ATLAS (A Toroidal LHC ApparatuS), will again be at the center of their research. CERN plans on increasing the number of LHC collisions by a factor of ten in the coming years.

The LHC requires enormous amounts of computing capacity to interpret its collisions. CERN scientists have run workloads on Stampede2. Now that Frontera is operational, CERN will have a much larger system to use to understand what is happening at these subatomic scales.

“We simulate the detector response to a given physics model,” said Robert Gardner, a research professor in the Enrico Fermi Institute at the University of Chicago, who co-leads the distributed computing facility group for the U.S. ATLAS collaboration.

“When we’re doing the analysis on the actual data, we may plot some distributions such as the particle mass, transverse momentum, or the ‘missing energy’ in the collision. And you get the number of candidates that we have for the raw data coming off the detector. Then we compare those to different kinds of models and see if we can match up the distributions. This provides clues to what might be actually happening during the collisions.”

From Nuclear Fission to Fusion Power
Another area involving global scientific collaboration is innovating new resources for supplying the world’s power needs. From more efficient wind generation to battery research and hydrogen mining from water, science is trying to find clean alternatives to fossil fuels.

Nuclear fusion—the merging of nuclei to release massive amounts of energy, like Earth’s Sun does—is considered the holy grail of energy production, without the drawbacks of today’s fission reactors. In France, such a reactor—the International Thermonuclear Experimental Reactor (ITER)—is being built by a consortium of seven governments. Scheduled for a 2025 completion date, it is designed to produce 20 to 25 times more power than it uses.

An urgent problem for designers is to be able to accurately and reliably predict—and avoid—large-scale disruptions. But for years, scientists have struggled to match physics models and simulations with the dynamics in a real reactor.

“If you try to use conventional theoretical methods, buttressed by high performance computing, you still aren’t going to be able to make predictions,” said William Tang, principal research physicist at the Princeton Plasma Physics Laboratory—the U.S. DOE National Lab for fusion studies. “You needed the impact of big data analytics that can deal with a lot of data that’s relevant to disruptions.”

Tang and his team have turned to Artificial Intelligence to help solve the problem. The team developed the Fusion Recurrent Neural Net (FRNN) Code, deploying deep learning for better predictions. Their code can predict disruption events with 90+ percent accuracy more than 30 milliseconds ahead of the disruption trigger event. Tang will take advantage of Frontera’s new resources for deep learning to further his research with the FRNN code and develop a control system that can avoid disruptions in ITER.

Computation for World Problems
Other challenges requiring massive computing scale include using precision agriculture and genomics to feed the world’s growing population and innovating cleaner coal combustion, which is still a leading source of energy.

“We need systems like Frontera to answer the big questions of our time, such as the sustainability of the environment and renewable energy,” said Professor Gardner. “We have to continue to work on frontier science and everything that comes after it, and we can’t do that without computation.”

A view between two rows of Frontera servers in the TACC Data Center.

Solution Summary
Frontera was built to support a new, much larger scale of scientific computing than TACC previously was able to. Built on 2nd Generation Intel® Xeon® Platinum processors inside Dell EMC PowerEdge* servers, with nearly half a million cores, Frontera will deliver a peak performance of 38.7 petaFLOPS, according to TACC’s Executive Director Dan Stanzione. The new supercomputer will also allow scientists to test new technologies, including Intel® Optane™ DC persistent memory, to assess how the supercomputing center might implement these technologies on their next next-generation supercomputer.

Frontera Highlights

  • 8,008 dual-socket Dell PowerEdge* C6420 servers with 2nd Generation Intel® Xeon® Scalable processors (448,448 cores total)
  • Peak performance of 38.7 petaFLOPS1
  • 50 nodes with Intel® Optane™ DC persistent memory
  • #5 most powerful supercomputer in the world, and the fastest at any university

Solution Ingredients

  • 8,008 Dell EMC PowerEdge C6420 compute nodes, consisting of 2nd Generation Intel® Xeon® Platinum processors, 56 cores per node
  • Intel® Optane™ DC persistent memory

Solution Summary
Frontera was built to support a new, much larger scale of scientific computing than TACC previously was able to. Built on 2nd Generation Intel® Xeon® Platinum processors inside Dell EMC PowerEdge* servers, with nearly half a million cores, Frontera will deliver a peak performance of 38.7 petaFLOPS, according to TACC’s Executive Director Dan Stanzione. The new supercomputer will also allow scientists to test new technologies, including Intel® Optane™ DC persistent memory, to assess how the supercomputing center might implement these technologies on their next next-generation supercomputer.

Frontera Highlights

  • 8,008 dual-socket Dell PowerEdge* C6420 servers with 2nd Generation Intel® Xeon® Scalable processors (448,448 cores total)
  • Peak performance of 38.7 petaFLOPS1
  • 50 nodes with Intel® Optane™ DC persistent memory
  • #5 most powerful supercomputer in the world, and the fastest at any university

Solution Ingredients

  • 8,008 Dell EMC PowerEdge C6420 compute nodes, consisting of 2nd Generation Intel® Xeon® Platinum processors, 56 cores per node
  • Intel® Optane™ DC persistent memory

Explore Related Intel® Products

Intel® Xeon® Scalable Processors

Drive actionable insight, count on hardware-based security, and deploy dynamic service delivery with Intel® Xeon® Scalable processors.

Learn more

Intel® Optane™ DC Persistent Memory

Extract more actionable insights from data – from cloud and databases, to in-memory analytics, and content delivery networks.

Learn more

Intel® Deep Learning Boost

Intel® Xeon® Scalable processors take embedded AI performance to the next level with Intel® Deep Learning Boost (Intel® DL Boost).

Learn more

Avisos y exenciones de responsabilidad

Las características y ventajas de las tecnologías Intel® dependen de la configuración del sistema y es posible que necesiten hardware y software habilitados, o la activación del servicio. El desempeño varía según la configuración del sistema. Ningún sistema informático puede proporcionar una seguridad absoluta. Consulte con el vendedor o el fabricante del sistema, o visite https://www.intel.com para obtener más información. // Es posible que el software y las cargas de trabajo utilizados en las pruebas de desempeño solo se hayan optimizado para desempeño en los microprocesadores Intel. Las pruebas de desempeño, como SYSmark y MobileMark, se miden utilizando sistemas de computación, componentes, software, operaciones y funciones específicos. Cualquier cambio en alguno de esos factores podría generar un cambio en los resultados. Debe consultar otra información y pruebas de desempeño que lo ayuden a evaluar plenamente las compras consideradas, incluido el desempeño de ese producto al combinarse con otros. Para obtener más información, visite https://www.intel.com/benchmarks. Los resultados de desempeño se basan en pruebas realizadas en la fecha indicada en la configuración y es posible que no reflejen todas las actualizaciones de seguridad que están a disposición del público. Consulte la divulgación de configuración para obtener más información. Ningún producto o componente puede proporcionar una seguridad absoluta. Los escenarios de reducción de costo publicados se utilizan como ejemplo de cómo un producto equipado con Intel® dado, en circunstancias y configuraciones específicas, puede afectar los costos futuros y proporcionar recortes en cuanto a costo. Las circunstancias pueden variar. Intel no garantiza costos o reducciones de costos, // Intel no ejerce control ni inspección algunos sobre los datos de análisis de desempeño o los sitios web de terceros a los que se hace referencia en este documento. Debe visitar el sitio web referido y confirmar si los datos a los que se hacen referencia son precisos. // En algunos casos de prueba, los resultados estimados o simulados con modelado o simulación de arquitectura o análisis internos de Intel con fines informativos. Cualquier diferencia en el hardware, software o configuración del sistema puede afectar el desempeño real.

Información sobre productos y desempeño

1

Prueba realizada por TACC para la clasificación TOP500 de julio de 2019. Diríjase a https://www.top500.org/system/179607.