OpenEuler Community has officially announced that the Bisheng compiler is now officially added to the OpenEuler operating system software repository. The latest software addition could be fetched for use cases.
In terms of benchmark, the new Huawei compiler for the OpenEuler operating system has version 2.1.0 and improves 24.3 percent of the hardware and software performance to run smoother and better applications.
Bisheng Compiler is a high-performance, reliable, and easily extensible compiler created by Huawei Compiler Lab. It supports C/C++/Fortran and other programming languages.
The compiler also enhances and introduces a variety of compilation and optimization technologies, aiming at certain application scenarios. It is optimized especially in high-performance computing (HPC) scenarios to obtain better performance benefits.
Bisheng 2.1.0 was released on December 30 last year and the current version enhances the loop optimization, structure reorganization optimization, block reorder optimization features, improves the performance of multiple sub-items of SPEC CPU 2017 and HPC workload.
The update adds support for pow initialization immediate data fitting, mathematical function control, and other precision control options, to further enhance the precision tuning options.
This update support multi-threaded parallel programming technology and Input/output enhancements (Fortran 2003) / asynchronous IO features to meet the Kunpeng scene’s needs for the Fortran language ecology.
Features Optimized with this update:
Bisheng Compiler adopts a variety of enhanced compilation optimization techniques, including but not limited to the following optimization features:
- Including Loop Unswitching: reduce the number of executions of branch jumps.
- Loop unroll-and-jam: Improve memory and cache locality and utilization.
- Loop Fusion: Directly reuse values in other loops, exposing more instruction scheduling opportunities.
- Loop Distribution: Reduce register pressure in loops and expose more vectorization opportunities.
- Loop Unrolling: This can reduce the number of dynamic instructions and discover more optimization opportunities, such as data reuse, wider instruction scheduling, and improved vectorization Data concurrency.
Memory layout optimization
Convert Array of Structures (AoS) to Structure of Arrays (SoA), and rearrangement optimization of arrays. Through the above method, the hit rate of the Cache will be improved, thereby improving the performance of the program.
By cooperating with the Kunpeng processor, the Bisheng compiler can accurately model the hardware-related characteristics, so that the compiler prefetch analysis code can accurately simulate the memory access characteristics of the Kunpeng processor, and then insert accurate prefetch instructions into the code, thereby improving the performance of the processor. Cache hit rate to improve program performance.
Combined with the Kunpeng NEON / SVE instruction set, Bisheng Compiler enhances vector automation, converting scalar programs that perform similar operations into vectorized programs so that computer programs can use one instruction to process multiple data and improve program performance.
Based on the ML automatic search technology, through multiple iterations, the optimal option is found in the optimizable space, and then the target program with better performance is compiled.
- Performance – test environment:
- OS: OpenEuler 20.03
- CPU: Kunpeng 920
The Bisheng compiler development team conducted performance evaluation based on the Bisheng compiler version 2.1.0. The SPEC CPU 2017 test report showed that the Bisheng compiler 2.1.0 achieved a comprehensive score of 399 points, and the GCC 9.3.0 comprehensive score was 321 points. Under the same hardware and software environment, the performance of Bisheng compiler is 24.3% higher than that of GCC.
(via – ithome)
Huawei opens China’s first EulerOS ecosystem innovation center
On January 3, 2022, Huawei and the government of Binhu District in China, unveiled the Wuxi Ascend and Euler Ecosystem innovation center. This new Huawei EulerOS innovation center is the first of its kind in China.
According to Huawei, Wuxi Shengteng & Euler Eco-Innovation Center is ecosystem cultivation and industrial development centered on the two root technologies of Shengteng AI basic software and hardware platform and EulerOS open-source operating system.
It aims at building a public computing power service platform and an application innovation incubation platform, Industrial aggregation development platform, scientific research & talent training platform.
The overall efforts are to provide a full-stack independent innovation of Shengteng AI software and hardware platform, operating system, development framework, toolchain, and deep learning platform.
EulerOS is an open-source operating system for digital infrastructure, which can be deployed on servers, cloud computing, edge computing, embedded devices, and other devices. The EulerOS can be used in sectors – IT (Information Technology), CT (Communication Technology), and OT (Operational Technology).
Euler and HarmonyOS:
During the 2021 China 5G + Industrial Internet Conference, Huawei revealed that OpenHarmony and OpenEuler operating systems are being built side by side and are important to form the digital infrastructure.
Huawei chairman, Hu Houkun said 5G + Industrial Internet is empowering the digital transformation of thousands of industries, and data-driven is the key. It is now necessary to achieve data acquisition, fast calculation, and good use, connecting more devices, lowering the threshold of AI, and developing industrial software. The key investment areas of the phase.
(via – ithome)
CTyunOS launched based on Huawei OpenEuler operating system
At OpenEuler Summit 2021, Huawei has officially donated the OpenEuler open source operating system to OpenAtom Foundation. At the same time, the company also announced the launch of CTyunOS, a cloud operating system based on the Huawei OpenEuler operating system.
According to the information, CTyunOS also means that Tianyi Cloud is gradually starting to deploy the underlying core technology. Moreover, with the new trends in the operating system field, combined with its own business requirements, China Telecom has launched the Huawei OpenEuler based operating system – CTyunOS based on the cloud network integration strategy.
China Telecom is the first in the industry to inherit the Huawei OpenEuler technology route for all services via CTyunOS. Simultaneously, it is the first operator to launch a dual version of OpenEuler based x86 and ARM independently developed.
The telecom service provider has achieved large scale commercial use and provides a unified cloud network edge. Operating system services.
China Telecom has been participating in the work of the OpenStack SIG since joining the Euler open source community. It has participated in the software migration of the OpenStack Q version throughout the entire process and has tested and verified the functions and compatibility.
After the release of openEuler 21.03, China Telecom passed and Kernel SIG cooperated to explore the use scenarios of EtMem. China Telecom contributed its experience in system resource pressure detection to the community and submitted and contributed PSI tools to the community.
China Telecom’s Tianyi cloud operating system CTyunOS provides the following technology research and development optimization. Through the optimization of the system kernel, the performance has been greatly improved.
Through a number of unique innovative technologies, the virtualization components are deeply customized to provide high Performance and low-latency virtualization capabilities. With the self-developed cloud platform computing management and other key components, the overall performance of the cloud platform is improved in multiple dimensions.
Through the adaptation and optimization of chips of different architectures, it provides homology and heterogeneous support capabilities. At the same time, It also greatly enhances the security features of the system and is a professional server operating system for cloud computing.
CTyunOS Top Features:
Optimizing kernel performance:
Innovative use of domain scheduling technology in the kernel improves the performance of process scheduling in a variety of scenarios. The performance of CPU, memory, IO, and network schedule is 17% ahead of the industry benchmark CentOS. Big data, web, and database scenarios lead CentOS by 15%-22%.
The NMI mechanism based on SEDI and PMU can perform more accurate performance analysis. By limiting the proportion of memory occupied by the page cache, the business runs more smoothly. With statistical profiling extension (Statistical Profiling Extension) to enhance the tuning capabilities under perf, It also supports hot swap of SAS/NVME disks.
Enhanced virtualization capabilities:
In-depth customization for KVM, through the CPU integration mechanism, CPU intelligent scheduling and other technologies, to provide high-performance, low-latency virtualization capabilities.
Supports smart network cards, can flexibly implement network and storage offloading, reduce host CPU and memory consumption, thereby greatly improving performance and virtual machine density.
Improve cloud platform capabilities:
The computing management component created by Tianyi Cloud provides a low-latency, high-performance cloud platform that supports ultra-large-scale clusters (10k + host clusters).
The customized authentication component GoStone project, compared with OpenStack Keystone, greatly improves authentication performance by caching, upgrading token generation methods, and optimizing password encryption methods, and has up to 100 times the security performance under the same resource consumption. promote.
The steel bare metal management component adopts squashfs lightweight and small-size mirroring, which is easier to save and transmit, and the online cycle is shortened to minutes. The pure asynchronous system architecture design provides flexible and scalable clustering capabilities.
Adapt to ARM and X86, support diverse computing power:
Adapt to heterogeneous computing power, support X86, ARM, and other architectures, and adapt and optimize on Kunpeng, Feiteng, Zhaoxin, and Haiguang. For multi-core scenarios, improve the parallelism of CPU multi-cores in terms of scheduling, locking, and reduce CPU shared resource conflicts to achieve task acceleration.
Use ktask to parallel single-core serial tasks to multiple CPUs for execution, making full use of the advantages of multi-core. Speed up by Kunpeng The engine KAE realizes the hardware acceleration of the encryption algorithm. Through the ARM64 kernel hot patch, the characteristics of the ARM64 instruction set are used to improve the performance of the basic library and the performance of CRC verification.
Enhance system security:
It can provide IMA integrity measurement framework and secGear confidential computing framework, which can judge whether the operating environment is safe and reliable, and shield the differences of confidential computing SDK under different architectures, making the calling process more efficient and easy to use.
At the same time, it can provide The security architecture tool security-tool makes security settings more convenient and automated.
Memory hierarchical expansion:
Memory hierarchical expansion uses DRAM and low-speed memory media such as SCM, AEP and other different memories according to different strategies. Through memory hierarchical scheduling, hot data can run in the DRAM high-speed memory area, and cold data can be exchanged to the low-speed memory area to achieve improvement.
The effect of physical memory usage efficiency. This feature is suitable for memory capacity-sensitive applications, such as mysql database, spark, and other applications. Joint innovation with Tianyi Cloud in the virtual machine scenario. When the expansion medium is AEP, the business performance of turning on etmem is about 30% higher than when not turning on etmem, which improves the cost-effectiveness of physical memory.
Tianyi Cloud has used Huawei OpenEuler memory grading feature
As revealed by the reports, Tianyi Cloud and Huawei OpenEuler open-source teams have jointly innovated the memory grading expansion feature and conducted internal prototype verification in Tianyi Cloud virtualization scenarios.
The result of this test shows that memory grading technology greatly improves memory cost performance. Let’s take a look at the memory grading technology in a bit of detail.
Constrained by the bottleneck of memory technology, memory costs are high. With the development of CPU computing power, especially the reduction of ARM core costs, memory has become a key issue that constrains business costs and performance. How to save memory cost and expand memory capacity has become an urgent problem to be solved. Therefore, Tianyi and Huawei OpenEuler introduced the memory grading expansion function.
The memory hierarchical expansion function does not affect business functions, through DRAM and low-speed memory media, such as SCM, AEP, etc. to form multi-level memory, through automatic memory scheduling to make hot data run in the DRAM high-speed memory area, allowing cold data to be exchanged to low-speed Memory area, thereby increasing the memory capacity and ensuring the efficient and stable operation of the core business.
This feature is suitable for applications that use a large amount of memory and are relatively infrequently used. In these scenarios, the effect is better and the benefits are greater.
In the virtualization scenario, how to expand the memory capacity while reducing the memory cost and increasing the memory oversold ratio is a business pain point facing Tianyi Cloud.
In response to this pain point, Tianyi Cloud and Huawei’s openEuler open source team decided to verify the memory grading technology in scenarios where virtual machine internal business access is not frequent, and try to increase virtual machine density while keeping business performance flat or slightly degraded.
Through joint innovation, the AEP and DDR scenario were prototyped and verified. Compared with when the memory expansion function is not enabled, the redis performance of the virtual machine in AEP has increased by about 30%, basically reaching the same level as the performance of the virtual machine redis in DDR.
At the same time, when the memory capacity is equal, the use of DDR with AEP reduces the memory cost by about 35% compared to the pure DDR scenario, which significantly improves the cost-effectiveness of memory usage.