June 28 2024

Servers that reboot themselves in the summer (and not just in the summer)

How summer temperatures can bring to light silent problems such as slowdowns and sudden restarts

As summer arrives, outdoor temperatures rise significantly, and while data centers are designed to maintain a controlled environment, outdoor heat can still affect the internal temperatures of servers. This can lead to various issues that, if not handled properly, can cause server slowdowns or even unexpected reboots. In this article, we will explore how summer temperatures can unearth latent problems in server cooling systems and how to address these issues.

Impact of high temperatures in data centers

Data centers are equipped with advanced cooling systems to maintain a stable and safe temperature for servers. However, during the summer, and especially on small business data rooms that aren't quite up to industry standards, the heat load can increase due to external heat, putting pressure on these systems. Even small increases in temperature can have a significant impact on server components, especially CPUs, which generate a lot of heat during operation.

Common problems caused by heat

  1. Fan failure (FAN): Fans are essential for dissipating heat from CPUs and other components. Over time, fans can wear out and stop working properly, reducing cooling effectiveness.
  2. Little thermal paste dissipation: Thermal paste is used to improve heat transfer between the CPU and the heatsink. If the thermal paste is exhausted or no longer compliant, cooling efficiency decreases, causing the CPU to overheat.
  3. Reaching the temperature threshold (Threshold): Many servers are configured to automatically shut down when the CPU temperature exceeds a certain threshold to prevent damage. This can lead to sudden reboots if summer temperatures push CPUs beyond these limits.
  4. CPU throttling: When a CPU reaches high temperatures, it may begin to reduce its clock speed to reduce the heat generated, a process known as throttling. This can cause significant slowdowns in server performance.

Diagnosing heat-related problems

Diagnosing heat-related problems can be relatively simple in person by directly observing the server's physical components. However, for an inexperienced user or system administrator, it can be more difficult to identify these problems without the proper tools. This is where the usefulness of software tools like lm_sensors.

What is lm_sensors?

lm-sensors

lm_sensors is an essential software tool for monitoring temperature, voltage and fan speed on Linux systems. This tool allows you to obtain real-time data from sensors integrated into server hardware components, making it easier to diagnose overheating and cooling problems. lm_sensors is especially useful for system administrators who want to keep their hardware in optimal condition, preventing failures due to overheating or fan malfunctions.

Installing lm_sensors

Installing lm_sensors varies depending on the Linux distribution you use. Below, we provide instructions for the main families of distributions: Red Hat derivatives (such as CentOS and Fedora) and Debian derivatives (such as Ubuntu).

Derived Red Hat distributions

To install lm_sensors on Red Hat-based distributions, such as CentOS, Fedora, or RHEL, you can use the package manager yum o dnf.

Derivative Debian distributions

To install lm_sensors on Debian-based distributions, such as Ubuntu and Debian itself, you can use the package manager apt.

Functions of lm_sensors

  • Temperature monitoring: Provides accurate temperature readings of various components such as CPU, GPU and motherboards.
  • Check the voltages: Monitors supply voltages to ensure they are within safe operating limits.
  • Fan control: Measures the speed of your fans to make sure they are working properly.
  • Threshold configuration: Allows you to set temperature and voltage thresholds to activate alarms in the event of abnormal values.

Case study: Analysis of the uploaded image

In the image uploaded below, we see an example of command output sensors on a Linux system. This system had rebooted itself twice in one morning. We analyze data to identify problems.

Output-lmsensor

Detailed analysis

  • CPU temperature: One of the first indicators of overheating problems is the temperature of the CPU. In the image, we see that the CPU temperature (CPUIN) is extremely high, reaching 90.0°C. This value far exceeds the alarm threshold set at 80.0°C. The alarm threshold is a predefined limit that, if exceeded, indicates that the CPU is operating at a dangerously high temperature. Exceeding this limit not only reduces server performance but can also permanently damage hardware components. Such significant overheating suggests that the cooling system is not working properly.
  • Fans (FAN): Another crucial aspect to consider is the operation of the fans. Fans are responsible for maintaining a safe operating temperature for the CPU and other components by dissipating heat generated during operation. In the output, we notice that all fans (fan1, fan2, …, fan7) show a speed of 0 RPM. This is a clear sign that the fans are not working. Failure to rotate the fans means there is not enough air circulation to cool the server's internal components, quickly leading to overheating.

Diagnosis

The main problem in this case is the broken fans, which led to the CPU overheating. With all fans idle, the heat generated by the CPU is not effectively dissipated, causing the temperature to rapidly rise to critical levels. This triggered the server's automatic shutdown mechanism to prevent permanent damage, leading to sudden reboots.

Solutions and recommendations

  1. Replacing the fans: The immediate solution is to replace the failed fans to restore adequate airflow and cooling.
  2. Checking the thermal paste: Check the condition of the thermal paste and replace it if necessary to improve heat dissipation.
  3. Continuous monitoring: Use tools like lm_sensors to constantly monitor temperatures and fan speeds, setting alarms to prevent future overheating problems.
  4. Power inspection: Check the power supply voltages to make sure there are no problems with the power supply or power distribution.

Conclusion

Summer temperatures can take a toll on servers, even in the best-equipped data centers. Issues like broken fans and spent thermal paste can go unnoticed until external heat brings them to light, causing sudden slowdowns and restarts. Using tools like lm_sensors, it is possible to monitor the condition of hardware components in real time and intervene promptly to avoid damage and service interruptions. Preventive maintenance and continuous monitoring are essential to ensure servers run smoothly even in the most extreme conditions.

Do you have doubts? Don't know where to start? Contact us!

We have all the answers to your questions to help you make the right choice.

Chat with us

Chat directly with our presales support.

0256569681

Contact us by phone during office hours 9:30 - 19:30

Contact us online

Open a request directly in the contact area.

INFORMATION

Managed Server Srl is a leading Italian player in providing advanced GNU/Linux system solutions oriented towards high performance. With a low-cost and predictable subscription model, we ensure that our customers have access to advanced technologies in hosting, dedicated servers and cloud services. In addition to this, we offer systems consultancy on Linux systems and specialized maintenance in DBMS, IT Security, Cloud and much more. We stand out for our expertise in hosting leading Open Source CMS such as WordPress, WooCommerce, Drupal, Prestashop, Joomla, OpenCart and Magento, supported by a high-level support and consultancy service suitable for Public Administration, SMEs and any size.

Red Hat, Inc. owns the rights to Red Hat®, RHEL®, RedHat Linux®, and CentOS®; AlmaLinux™ is a trademark of AlmaLinux OS Foundation; Rocky Linux® is a registered trademark of the Rocky Linux Foundation; SUSE® is a registered trademark of SUSE LLC; Canonical Ltd. owns the rights to Ubuntu®; Software in the Public Interest, Inc. holds the rights to Debian®; Linus Torvalds holds the rights to Linux®; FreeBSD® is a registered trademark of The FreeBSD Foundation; NetBSD® is a registered trademark of The NetBSD Foundation; OpenBSD® is a registered trademark of Theo de Raadt. Oracle Corporation owns the rights to Oracle®, MySQL®, and MyRocks®; Percona® is a registered trademark of Percona LLC; MariaDB® is a registered trademark of MariaDB Corporation Ab; REDIS® is a registered trademark of Redis Labs Ltd. F5 Networks, Inc. owns the rights to NGINX® and NGINX Plus®; Varnish® is a registered trademark of Varnish Software AB. Adobe Inc. holds the rights to Magento®; PrestaShop® is a registered trademark of PrestaShop SA; OpenCart® is a registered trademark of OpenCart Limited. Automattic Inc. owns the rights to WordPress®, WooCommerce®, and JetPack®; Open Source Matters, Inc. owns the rights to Joomla®; Dries Buytaert holds the rights to Drupal®. Amazon Web Services, Inc. holds the rights to AWS®; Google LLC holds the rights to Google Cloud™ and Chrome™; Microsoft Corporation holds the rights to Microsoft®, Azure®, and Internet Explorer®; Mozilla Foundation owns the rights to Firefox®. Apache® is a registered trademark of The Apache Software Foundation; PHP® is a registered trademark of the PHP Group. CloudFlare® is a registered trademark of Cloudflare, Inc.; NETSCOUT® is a registered trademark of NETSCOUT Systems Inc.; ElasticSearch®, LogStash®, and Kibana® are registered trademarks of Elastic NV Hetzner Online GmbH owns the rights to Hetzner®; OVHcloud is a registered trademark of OVH Groupe SAS; cPanel®, LLC owns the rights to cPanel®; Plesk® is a registered trademark of Plesk International GmbH; Facebook, Inc. owns the rights to Facebook®. This site is not affiliated, sponsored or otherwise associated with any of the entities mentioned above and does not represent any of these entities in any way. All rights to the brands and product names mentioned are the property of their respective copyright holders. Any other trademarks mentioned belong to their registrants. MANAGED SERVER® is a trademark registered at European level by MANAGED SERVER SRL, Via Enzo Ferrari, 9, 62012 Civitanova Marche (MC), Italy.

Back to top