In October 2020, a researcher at New York University (NYU) made a groundbreaking discovery. She uncovered the structural biophysics of the novel SARS-CoV-2 virus that causes COVID-19. The revelation came just a few months after NYU installed a massive Lenovo supercomputer in the New York Metropolitan area.
“That chemistry researcher, her job could not be completed on our old infrastructure,” said Dr. David Ackerman, Associate Vice President for Research Technology, NYU IT and Chief Digital Officer, NYU Libraries. Ackerman is responsible for NYU's research technology services and strategy. “It took just 30 hours on our new infrastructure.”
The new supercomputer, David says, “is groundbreaking for us — and the world.”
But installing a High-Performance Computing (HPC) cluster at the height of a spike in COVID-19 cases on the East Coast was no minor feat. There were limitations on access, travel, and ability to meet with the customer. Yet that didn’t stop two indomitable Lenovo engineers.
In July, Chris Eckhoff drove 24 hours from his home in Florida to New York to install the NYU supercomputer. That’s over 1,000 miles (1,600 kms)!
Chris eventually was joined by Chulho Kim, another longtime Lenovo employee. The two spent months living out of a hotel to ensure the installation went off without a hitch. They left only to work and purchase food.
“It was an exceptional time due to exceptional circumstances,” Chris recounted. “But the service must go on.”
And the installation was exceptionally urgent.
“The idea that we wouldn’t move fast was just unacceptable,” David added. “I wrote to Lenovo: We need this supercomputer, and we need it now to help the world!”
They say the best technology resembles magic, and this is like magic.
— David Ackerman, AVP, Research Services and CDO, NYU Libraries
“It had to be up and running because many of the researchers using the system were going to be doing COVID research,” said Scott Tease, General Manager, HPC and AI, Lenovo. “NYU sent out a challenge to Lenovo and their other vendors asking for our support to get the system up during such difficult times — and we responded.”
Getting a system of this size and complexity built and shipped during an unprecedented time is not for the faint of heart. Luckily, the duo was supported by Lenovo’s industry-leading global supply chain teams who worked tirelessly to assemble the supercomputer, test it in the factory, and then manage the logistics of a synchronized arrival at NYU’s data center, just in time for the power-up.
NYU couldn’t have been more impressed by the results. One of the major appeals of working with Lenovo, said David, is its high-performance server portfolio equipped with the Lenovo Neptune™ liquid cooling technology. Not only is the system greener and more cost-effective than alternatives, but it’s also more powerful.
“Our original TOP500 number gave us a 1.729 petaFLOP rating,” he said. The TOP500 project ranks and details the 500 most powerful supercomputers in the world, which it measures in petaFLOPS. That number increased to 2.008, “just by having the direct water cooling” — a 20-percent boost.
“They say the best technology resembles magic, and this is like magic,” David said.
Behind the magic were Chulho and Chris, who worked around-the-clock as they applied their expertise and agility to ensure the cluster got installed on time.
The new supercomputer is groundbreaking for us — and the world.
Usually, one member of the team works solely on troubleshooting and identifying hardware issues. “The hard part is troubleshooting, figuring out why a certain node is running slow, if your network setting is correct, and so on.” The cluster is only as strong as its weakest link, meaning even minor issues impact the configuration.
But this time around, he had to wear multiple hats, “running tests, replacing parts.” Building a supercomputer, after all, is no small feat. A typical supercomputer employs upwards of tens of thousands of cores working in parallel. Troubleshooting such a device, Chulho says, is like searching for a needle in a haystack.
“There were many factors that made this a challenging project,” Chris added. Normally, an HPC cluster is installed in close contact with the client and with a team of two to three people. Things worked differently this time. But Chris pushed ahead, working to meet the deadline so NYU’s research could go on unimpeded.
Chulho says the experience taught him that success is possible even when things seem impossible. At the very least, he realized he “had to give it a try.”
“I didn’t think there was any other option,” he said.