To cool datacenter servers,Microsoft turns to boiling liquid



Emails and other communications sent betweenMicrosoft employees are literally making liquid boil inside a steel holdingtank packed with computer servers at this datacenter on the eastern bank of theColumbia River.



Unlike water, the fluid inside the couch-shapedtank is harmless to electronic equipment and engineered to boil at 122 degreesFahrenheit, 90 degrees lower than the boiling point of water.



The boiling effect, which is generated by thework the servers are doing, carries heat away from laboring computerprocessors. The low-temperature boil enables the servers to operatecontinuously at full power without risk of failure due to overheating.



Inside the tank, the vapor rising from the boilingfluid contacts a cooled condenser in the tank lid, which causes the vapor tochange to liquid and rain back onto the immersed servers, creating a closedloop cooling system.


位于华盛顿州雷蒙德市的微软数据中心高级开发团队的首席硬件工程师胡萨姆·艾丽萨(Husam Alissa)说:“我们是第一家在生产环境中运行两相浸没冷却的云提供商。”

“We are thefirst cloud provider that is running two-phase immersion cooling in aproduction environment,” said Husam Alissa, a principal hardware engineer onMicrosoft’s team for datacenter advanced development in Redmond, Washington.




IoannisManousakis,a principal software engineer with Azure (left), and Husam Alissa, a principalhardware engineer on Microsoft’s team for datacenter advanced development(right), inspect the inside of a two-phase immersion cooling tank at aMicrosoft datacenter. Photo by Gene Twedt for Microsoft.



Moore’s Law for the datacenter



The production environment deployment oftwo-phase immersion cooling is the next step in Microsoft’s long-term plan tokeep up with demand for faster, more powerful datacenter computers at a timewhen reliable advances in air-cooled computer chip technology have slowed.



For decades, chip advances stemmed from theability to pack more transistors onto the same size chip, roughly doubling thespeed of computer processors every two years without increasing their electricpower demand.



This doubling phenomenon is called Moore’s Lawafter Intel co-founder Gordon Moore, who observed the trend in 1965 andpredicted it would continue for at least a decade. It held through the 2010sand has now begun to slow.



That’s because transistor widths have shrunk tothe atomic scale and are reaching a physical limit. Meanwhile, the demand forfaster computer processors for high performance applications such as artificialintelligence has accelerated, Alissa noted.



To meet the need for performance, the computingindustry has turned to chip architectures that can handle more electric power.Central processing units, or CPUs, have increased from 150 watts to more than300 watts per chip, for example. Graphics processing units, or GPUs, haveincreased to more than 700 watts per chip.



The more electric power pumped through theseprocessors, the hotter the chips get. The increased heat has ramped up coolingrequirements to prevent the chips from malfunctioning.


位于Redmond的微软数据中心高级开发小组杰出工程师及副总裁克里斯蒂安·贝拉迪(Christian Belady)说:“风冷不够了。”“这就是驱使我们进行浸入式冷却的原因,我们可以在其中直接将芯片的表面煮沸。” 

“Air cooling is not enough,” said ChristianBelady, distinguished engineer and vice president of Microsoft’s datacenteradvanced development group in Redmond. “That’s what’s driving us to immersioncooling, where we can directly boil off the surfaces of the chip.”



Heat transfer in liquids, he noted, is ordersof magnitude more efficient than air.



What’s more, he added, the switch to liquidcooling brings a Moore’s Law-like mindset to the whole of the datacenter.



“Liquid cooling enables us to go denser, andthus continue the Moore’s Law trend at the datacenter level,” he said.



微软数据中心高级开发小组的杰出工程师兼副总裁克里斯蒂安·贝拉迪(Christian Belady)站在微软数据中心的两相浸入式冷却水箱旁边。由Gene Twedt为Microsoft摄影。

ChristianBelady, distinguished engineer and vice president of Microsoft’s datacenteradvanced development group, stands next to a two-phase immersion cooling tankat a Microsoft datacenter. Photo by Gene Twedt for Microsoft.



Lesson learned from cryptocurrency miners



Liquid cooling is a proven technology, Beladynoted. Most cars on the road today rely on it to prevent engines fromoverheating. Several technology companies, including Microsoft, areexperimenting with cold plate technology, in which liquid is piped throughmetal plates, to chill servers.



Participants in the cryptocurrency industrypioneered liquid immersion cooling for computing equipment, using it to coolthe chips that log digital currency transactions.



Microsoft investigated liquid immersion as acooling solution for high-performance computing applications such as AI. Amongother things, the investigation revealed that two-phase immersion cooling reducedpower consumption for any given server by 5% to 15%.



The findings motivated the Microsoft team towork with Wiwynn, a datacenter IT system manufacturer and designer, to develop atwo-phase immersion cooling solution. The first solution is now running atMicrosoft’s datacenter in Quincy.



That couch-shaped tank is filled with anengineered fluid from 3M. 3M’s liquid cooling fluids have dielectric propertiesthat make them effective insulators, allowing the servers to operate normallywhile fully immersed in the fluid.



This shift to two-phase liquid immersioncooling enables increased flexibility for the efficient management of cloudresources, according to Marcus Fontoura, a technical fellow and corporate vicepresident at Microsoft who is the chief architect of Azure compute.



For example,software that manages cloud resources can allocate sudden spikes in datacentercompute demand to the servers in the liquid cooled tanks. That’s because theseservers can run at elevated power – a process called overclocking – withoutrisk of overheating.



“For instance, we know that with Teams when youget to 1 o’clock or 2 o’clock, there is a huge spike because people are joiningmeetings at the same time,” Fontoura said. “Immersion cooling gives us moreflexibility to deal with these burst-y workloads.”



沸腾的液体带走了Microsoft数据中心的计算机服务器产生的热量。微软是第一家在生产环境中运行两阶段浸入式冷却的云提供商。由Gene Twedt为Microsoft摄影。 

Boilingliquid carries away heat generated by computer servers at a Microsoftdatacenter. Microsoft is the first cloud provider to run two-phase immersioncooling in a production environment. Photo by Gene Twedt for Microsoft.



Sustainable datacenters



Adding the two-phase immersion cooled serversto the mix of available compute resources will also allow machine learningsoftware to manage these resources more efficiently across the datacenter, frompower and cooling to maintenance technicians, Fontoura added.



“We will have not only a huge impact onefficiency, but also a huge impact on sustainability because you make sure thatthere is not wastage, that every piece of IT equipment that we deploy will bewell utilized,” he said.



Liquid cooling is also a waterless technology,which will help Microsoft meet its commitmentto replenish more water than it consumes bythe end of this decade.



The cooling coils that run through the tank andenable the vapor to condense are connected to a separate closed loop systemthat uses fluid to transfer heat from the tank to a dry cooler outside thetank’s container. Because the fluid in these coils is always warmer than theambient air, there’s no need to spray water to condition the air forevaporative cooling, Alissa explained.



Microsoft, together with infrastructureindustry partners, is also investigating how to run the tanks in ways thatmitigate fluid loss and will have little to no impact on the environment.



“If done right, two-phase immersion coolingwill attain all our cost, reliability and performance requirementssimultaneously with essentially a fraction of the energy spend compared to aircooling,” said IoannisManousakis, a principal software engineer with Azure.




Microsoft团队正在研究两相浸没式冷却技术。从左至右图:数据中心运营管理部门的Dave Starkenburg,Microsoft数据中心高级开发小组的杰出工程师兼副总裁ChristianBelady,Azure首席软件工程师IoannisManousakis和Microsoft数据中心高级团队的首席硬件工程师HusamAlissa发展。由Gene Twedt为Microsoft摄影。 

A Microsoftteam is exploring two-phase immersion cooling technology. Pictured from left toright: Dave Starkenburg, datacenter operations management, Christian Belady,distinguished engineer and vice president of Microsoft’s datacenter advanceddevelopment group, IoannisManousakis, principal software engineer with Azure,and Husam Alissa, principal hardware engineer on Microsoft’s team fordatacenter advanced development. Photo by Gene Twedt for Microsoft.



‘We brought the sea to the servers’



Microsoft’s investigation into two-phaseimmersion cooling is part of the company’s multi-pronged strategy to makedatacenters more sustainable and efficient to build, operate and maintain.



For example, the datacenter advanceddevelopment team is also exploring the potential to usehydrogen fuel cells instead of diesel generators for backuppower generation at datacenters.



The liquid cooling project is similar to Microsoft’sProject Natick, which is exploring the potential ofunderwater datacenters that are quick to deploy and can operate for years onthe seabed sealed inside submarine-like tubes without any onsite maintenance bypeople.



Instead of an engineered fluid, the underwaterdatacenter is filled with dry nitrogen air. The servers are cooled with fansand a heat exchange plumbing system that pumps piped seawater through thesealed tube.


来自Project Natick的一个关键发现是,海底服务器的故障率是陆地数据中心复制服务器故障率的八分之一。初步分析表明,缺乏湿度和氧气的腐蚀作用是水下服务器性能优越的主要原因。 

A key finding from Project Natick is that theservers on the seafloor experienced one-eighth the failure rate of replicaservers in a land datacenter. Preliminary analysis indicates that the lack ofhumidity and corrosive effects of oxygen were primarily responsible for thesuperior performance of the servers underwater.



Alissa anticipates the servers inside theliquid immersion tank will experience similar superior performance. “We broughtthe sea to the servers rather than put the datacenter under the sea,” he said.




Azure的首席软件工程师IoannisManousakis从Microsoft数据中心的两相浸入式冷却水箱中卸下了刀片服务器。由Gene Twedt为Microsoft摄影。 

IoannisManousakis,a principal software engineer with Azure, removes a server blade from atwo-phase immersion cooling tank at a Microsoft datacenter. Photo by Gene Twedtfor Microsoft.



The future



If the servers in the immersion tank experiencereduced failure rates as anticipated, Microsoft could move to a model wherecomponents are not immediately replaced when they fail. This would limit vaporloss as well as allow tank deployment in remote, hard-to-service locations.



What’s more, the ability to densely packservers in the tank enables a re-envisioned server architecture that’soptimized for low-latency, high-performance applications as well aslow-maintenance operation, Belady noted.



Such a tank, for example, could be deployedunder a 5G cellular communications tower in the middle of a city forapplications such as self-driving cars.



For now, Microsoft has one tank runningworkloads in a hyperscale datacenter. For the next several months, the Microsoftteam will perform a series of tests to prove the viability of the tank and thetechnology.



“This first step is about making people feelcomfortable with the concept and showing we can run production workloads,”Belady said.