Monitoring the temperature of AI data centers using NTC thermistors
2026-06-11
With the increasing demand for artificial intelligence (AI) and the improvement of power density, data centers are facing unprecedented thermal management challenges. Accurate real-time temperature monitoring is required to optimize performance and efficiency while preventing overheating. These detection solutions must be accurate, responsive, robust, and able to cope with rapidly changing thermal loads on high-sensitivity devices.
This article will explore the thermal management challenges faced by modern AI data center designers, and provide a detailed analysis of various cooling systems, including air conditioning, immersion cooling, and thermal management solutions. Then, introduce the negative temperature coefficient (NTC) thermistor solutions of EPCOS (TDK) and explain how to utilize these solutions to address thermal management challenges.
Why will AI data centers bring new thermal management challenges? AI hardware such as graphics processing units (GPUs) and tensor processing units (TPUs) typically consume much more power than traditional central processing units (CPUs). Therefore, data centers with a focus on AI often have relatively high power density and concentrated hotspots, making it difficult to manage using traditional cooling methods.
Even worse, AI workloads often vary greatly, and during reinforcement training or inference operations, thermal loads may rapidly climb. If proper thermal management is not carried out, these situations may lead to performance degradation, unplanned downtime, and hardware acceleration degradation.
To meet these emerging demands, more advanced cooling methods need to be adopted for data centers. Direct chip cooling is a common cooling method. This technology aligns cooling pipes, cold plates, or heat exchangers directly with high-power devices such as CPUs, GPUs, and memory. In addition, immersion cooling method can also be chosen, which involves immersing the entire server in non-conductive liquid.
Air conditioning is also undergoing various upgrades. For example, inter row cooling units and built-in cooling units in cabinets can provide zone cooling on the basis of the overall computer room air conditioning system, that is, respond in real-time to local overheating problems.
Although the specific conditions of these cooling systems vary, they are all driving the demand for temperature monitoring with wider distribution and faster response. This article takes the direct connected chip cooling system as an example. Each target chip needs to be equipped with a heat sink sensor to ensure temperature standards are maintained. It is necessary to monitor the inflow of coolant through pipeline mounted sensors, and other sensors need to be installed on the coolant distribution device and heat exchanger to ensure efficient operation of the system.
The advantages of NTC thermistor sensors in data center applications NTC thermistors can meet all these requirements. As the name suggests, the resistance of NTC sensors decreases with increasing temperature. As for NTC thermistors, this is achieved through a small thermosensitive oxide ceramic element enclosed in a protective metal or epoxy resin casing.
Figure 1 shows the typical temperature resistance curve of a thermistor with a rated resistance of 2-5 k Ω at 25 ° C. As shown in the figure, the larger the resistance, the more suitable the thermistor is for high-temperature applications because the change in resistance is easier to measure.
Typical temperature resistance curve graph Figure 1: The typical temperature resistance curve of a thermistor with a rated value of 2 k Ω to 5 k Ω at 25 ° C. (Image source: EPCOS (TDK))
The advantages that NTC thermistors bring to AI data centers include
High precision and fast response: extremely sensitive to slight temperature changes, and due to the small thermal mass, the response speed is fast. These features enable NTC thermistors to effectively meet the rapidly fluctuating thermal demands of AI data centers. Durability and stability: Made of sturdy materials, it has excellent long-term reliability and minimal resistance drift over time. This stability minimizes maintenance requirements and reduces the risk of unexpected downtime to the greatest extent possible. Compact size and flexible installation: With its small size, it can be easily integrated into device intensive data center environments with limited space. Featuring various shapes, it can meet the diverse needs of cooling systems in artificial intelligence data centers. The EPCOS NTC thermistor series fully embodies these advantages. This product series includes solutions for monitoring radiators and pipelines, submerged cooling systems, and air handling units.
Monitoring high-power components using NTC thermistors installed on heat sinks High power processors such as GPUs and TPUs require rigorous thermal monitoring to maintain performance and prevent overheating. B57703M0103G040 (Figure 2) is used for direct installation on the heat sink, making it very suitable for this task. This screw fixed sensor encapsulates an NTC thermistor in a metal tag housing with protruding ring ears.
EPCOS B57703M0103G040 Loop Terminal Thermistor Figure 2: B57703M0103G040 ring junction thermistor can achieve precise temperature monitoring of high-power processor heat sinks. (Image source: EPCOS (TDK))
The design of screw fixed sensors is both convenient and important, ensuring good thermal coupling with the surface of the heat sink and consistent contact pressure, thereby reducing thermal resistance and improving measurement accuracy when the load changes rapidly.
The sensor has passed a long-term stability test of 10000 hours at a temperature of+70 ° C and can be used in high temperature conditions commonly found in AI data center workloads. The rated resistance of the sensor at+25 ° C is 10 k Ω, providing a reliable basis for measuring higher operating temperatures and accurate feedback for the temperature control system.