July 4 2024

BOLT: Binary Optimization for Large-Scale Meta Applications

BOLT by Meta optimizes binaries post-compilation, improving performance and CPU efficiency. Used on a large scale, including projects like HHVM.

BOLT Binary-Optimization-and-Layout-Tool

Introduction

In high-performance computing environments like those at Meta (formerly Facebook), source code size and efficiency are critical. Modern CPUs often suffer from “instruction starvation” due to the size of the binaries, slowing down access to necessary instructions. Meta developed BOLT (Binary Optimization and Layout Tool) to address this problem. For further details, you can consult the Meta Engineering article.

What is BOLT

BOLT is a binary optimization tool designed to improve the efficiency of the CPU's instruction cache by rearranging instructions in binary code. It is compatible with any compiler, including GCC and Clang, and supports third-party libraries without requiring access to the source code.

BOLT operation

The operation of BOLT is based on the reconstruction of the control flow graph of the code and the reorganization of the functions based on the collected execution profiles. This process includes several stages:

  1. Compatibility: BOLT works with any compiler and assembly code, allowing for broad applicability.
  2. Profiling: Use sample-based profiling using the Linux perf tool to collect code execution data.
  3. Layout Optimization: Rearrange code placement within and between functions to optimize CPU access.

Differences and Advantages compared to Standard Compilers such as GCC

Standard compilers such as GCC optimize code at compile time, using limited information about actual performance at run time. Instead, BOLT stands out for the following characteristics:

  1. Post-build optimization: BOLT operates on post-compiled binary code, allowing optimizations based on the actual behavior of the code during execution. This technique allows you to apply improvements that would not be possible during the initial compilation phase.
  2. Dynamic Profiling: Use profiling data collected at runtime to identify bottlenecks and optimize code layout based on real-world usage. This approach ensures that optimizations are driven by empirical data, improving the effectiveness of the changes you make.
  3. Reorganization of the Code: BOLT rearranges code instructions to improve instruction cache efficiency, reducing cache misses and improving performance. This process includes reordering functions and merging code blocks to reduce instruction access time.
  4. Elimination of Duplicate Code: Reduces binary size by eliminating duplicate and redundant code. This technique not only optimizes the memory used, but also improves cache efficiency, making applications faster and less resource-hungry.

Performance Impact

The initial implementation of BOLT showed a 3% improvement in the performance of HHVM (HipHop Virtual Machine), an execution environment for PHP and Hacking developed by Meta to improve the efficiency and scalability of PHP code. With further optimizations, the increase can be up to 8% for HHVM and range from 2% to 15% for other services, depending on the CPU architecture. These improvements are achieved through various specific optimization techniques, including:

  • Macro-fusion: BOLT prevents performance regression caused by instruction misalignments. This technique combines multiple simple instructions into a single macroinstruction, reducing the CPU cycles needed to execute the code.
  • Jump Table Placement: Improve the layout of jump tables by optimizing their location in memory. This reduces the jump time between instructions, improving the execution speed of conditional functions and switch cases.
  • Identical Code Folding: Eliminate duplicate code by merging identical sections of code. This technique reduces the size of the binary, improving instruction cache efficiency and reducing memory consumption.
  • PLT Optimization and Constant Load Elimination: Application profile specific optimizations to improve overall efficiency. PLT Optimization optimizes library function calls by reducing Procedure Linkage Table overhead, while Constant Load Elimination eliminates redundant loadings of constant values, simplifying execution flow and improving performance.

Benefits of BOLT

Binary optimization with BOLT brings several significant benefits:

  • Instruction Cache Improvement: By rearranging the code, BOLT improves the efficiency of the instruction cache, reducing cache misses and improving performance. This process involves relocating the most frequently used instructions to ensure they are easily accessible from the cache, minimizing latency times.
  • Reduction of Execution Time: The optimizations brought by BOLT reduce the overall execution time of applications, making systems more responsive. The reorganization of the code allows for more linear and faster execution, eliminating redundant instructions and improving the execution flow.
  • Optimization of Critical Functions: BOLT analyzes and reorganizes the critical functions that are most frequently executed, ensuring that they are positioned to minimize the jump time between instructions. This approach is especially useful for complex applications with many function calls.
  • Elimination of Duplicate Code: Reduces binary size by eliminating duplicate and redundant code. This not only improves cache efficiency, but also reduces the memory required by the application, improving scalability.
  • Adaptability to Different Compilers and Languages: BOLT can be used with a wide range of compilers and programming languages, providing flexibility for developers. It is compatible with GCC, Clang and supports third-party libraries without requiring access to the source code, making it a versatile solution for different platforms.
  • Profiling Based on Real Execution: It uses profiling data collected during actual application execution, enabling optimizations based on how code is actually used, rather than theoretical assumptions. This leads to more concrete and targeted improvements in performance.
  • Scalability of Optimizations: BOLT's optimization techniques are scalable and can be applied to projects of any size, from small software to large enterprise applications, improving performance at scale.
  • Continuous improvement: BOLT optimizations are not static; the system can be continuously monitored and adapted for new optimizations as execution conditions change or applications are updated, ensuring consistently high performance.

Case Studies and Applications

The adoption of BOLT has proven to be particularly effective in various case studies, including:

  • HHVM: BOLT improved the performance of HHVM, a PHP and Hack execution environment developed by Meta, by 3% initially, with further improvements of up to 8% with subsequent optimizations. These performance increases were achieved through improved code organization and reduced instruction access times.
  • Large Web Services: Other Meta services have seen performance increases ranging from 2% to 15%, demonstrating BOLT's effectiveness in various large-scale computing contexts. The optimizations have made it possible to reduce response times and improve the overall efficiency of the systems, making applications more responsive and scalable.

BOLT implementation

Implementing BOLT requires a series of steps involving profiling, analysis, and code reorganization. A typical workflow for using BOLT is presented below:

  • Collection of Execution Profiles: Use the Linux perf tool to collect detailed data about code execution, including execution times, function call rates, and bottlenecks. This process requires running applications in production or simulated environments to obtain realistic and meaningful data.
  • Code Analysis: Analyze execution profiles to identify areas of the code that need optimization, evaluating the effectiveness of the current data structures and algorithms used. This includes identifying the most frequently called functions and the most resource-consuming sections of code.
  • Optimization and Reorganization: Apply BOLT optimization techniques to rearrange code instructions, improve instruction cache efficiency, and reduce instruction access times. This includes reordering functions to reduce unnecessary jumps, merging common code blocks to reduce duplication, and eliminating redundancies. BOLT can also reorder data and control structures to improve cache locality.
  • Performance evaluation: Measure post-optimization performance to evaluate the effectiveness of your changes. Use standardized benchmarks and load tests to compare performance before and after optimization, ensuring improvements are significant and sustainable. It is essential to constantly monitor performance to identify any regressions or new optimization opportunities.

Challenges and Considerations

While BOLT offers numerous benefits, there are some challenges and considerations to keep in mind when implementing:

  • Code Compatibility: It is important to ensure that BOLT is compatible with existing source code and third-party libraries. This may require changes in the build process and dependency management to ensure smooth integration.
  • Accurate Profiling: Accurately collecting execution profiles is crucial to achieving effective optimizations. Use advanced profiling tools to collect detailed data representative of real-world application usage. This helps identify real bottlenecks and the most relevant optimization opportunities.
  • Continuous monitoring: Performance must be continuously monitored to ensure that optimizations remain effective over time. Implement a monitoring system that regularly checks application performance and reports any regressions or anomalies. This allows you to intervene promptly to maintain the benefits obtained with BOLT and to adapt to changes in the workload or execution environment.

Future of BOLT

Meta continues to improve BOLT, with the goal of extending its benefits to more and more projects and applications. Collaboration with the open-source community is a key element for the future of BOLT, allowing developers to contribute new ideas and improvements.

Conclusions

BOLT represents a significant advancement in binary optimization for large-scale applications. By reducing execution times and improving CPU efficiency, BOLT helps improve the performance of applications, making them more responsive and efficient. Adopting tools like BOLT is essential for those working with large web platforms and services, offering a significant competitive advantage in terms of performance and scalability.

Do you have doubts? Don't know where to start? Contact us!

We have all the answers to your questions to help you make the right choice.

Chat with us

Chat directly with our presales support.

0256569681

Contact us by phone during office hours 9:30 - 19:30

Contact us online

Open a request directly in the contact area.

INFORMATION

Managed Server Srl is a leading Italian player in providing advanced GNU/Linux system solutions oriented towards high performance. With a low-cost and predictable subscription model, we ensure that our customers have access to advanced technologies in hosting, dedicated servers and cloud services. In addition to this, we offer systems consultancy on Linux systems and specialized maintenance in DBMS, IT Security, Cloud and much more. We stand out for our expertise in hosting leading Open Source CMS such as WordPress, WooCommerce, Drupal, Prestashop, Joomla, OpenCart and Magento, supported by a high-level support and consultancy service suitable for Public Administration, SMEs and any size.

Red Hat, Inc. owns the rights to Red Hat®, RHEL®, RedHat Linux®, and CentOS®; AlmaLinux™ is a trademark of AlmaLinux OS Foundation; Rocky Linux® is a registered trademark of the Rocky Linux Foundation; SUSE® is a registered trademark of SUSE LLC; Canonical Ltd. owns the rights to Ubuntu®; Software in the Public Interest, Inc. holds the rights to Debian®; Linus Torvalds holds the rights to Linux®; FreeBSD® is a registered trademark of The FreeBSD Foundation; NetBSD® is a registered trademark of The NetBSD Foundation; OpenBSD® is a registered trademark of Theo de Raadt. Oracle Corporation owns the rights to Oracle®, MySQL®, and MyRocks®; Percona® is a registered trademark of Percona LLC; MariaDB® is a registered trademark of MariaDB Corporation Ab; REDIS® is a registered trademark of Redis Labs Ltd. F5 Networks, Inc. owns the rights to NGINX® and NGINX Plus®; Varnish® is a registered trademark of Varnish Software AB. Adobe Inc. holds the rights to Magento®; PrestaShop® is a registered trademark of PrestaShop SA; OpenCart® is a registered trademark of OpenCart Limited. Automattic Inc. owns the rights to WordPress®, WooCommerce®, and JetPack®; Open Source Matters, Inc. owns the rights to Joomla®; Dries Buytaert holds the rights to Drupal®. Amazon Web Services, Inc. holds the rights to AWS®; Google LLC holds the rights to Google Cloud™ and Chrome™; Microsoft Corporation holds the rights to Microsoft®, Azure®, and Internet Explorer®; Mozilla Foundation owns the rights to Firefox®. Apache® is a registered trademark of The Apache Software Foundation; PHP® is a registered trademark of the PHP Group. CloudFlare® is a registered trademark of Cloudflare, Inc.; NETSCOUT® is a registered trademark of NETSCOUT Systems Inc.; ElasticSearch®, LogStash®, and Kibana® are registered trademarks of Elastic NV Hetzner Online GmbH owns the rights to Hetzner®; OVHcloud is a registered trademark of OVH Groupe SAS; cPanel®, LLC owns the rights to cPanel®; Plesk® is a registered trademark of Plesk International GmbH; Facebook, Inc. owns the rights to Facebook®. This site is not affiliated, sponsored or otherwise associated with any of the entities mentioned above and does not represent any of these entities in any way. All rights to the brands and product names mentioned are the property of their respective copyright holders. Any other trademarks mentioned belong to their registrants. MANAGED SERVER® is a trademark registered at European level by MANAGED SERVER SRL, Via Enzo Ferrari, 9, 62012 Civitanova Marche (MC), Italy.

Back to top