NCCL Release 2.26.2
This is the NCCL 2.26.2 release notes. For previous NCCL release notes, refer to the NCCL Archives.
Compatibility
-
Deep learning framework containers. Refer to the Support Matrix for the supported container version.
-
This NCCL release supports CUDA 12.2, CUDA 12.4, and CUDA 12.8.
Key Features and Enhancements
This NCCL release includes the following key features and enhancements.
-
Added support for the profiling of CUDA kernel start and ends and of the network plugin. Improved profiling granularity and added support for graph capturing.
-
Added implicit launch order, which allows to prevent deadlocks when using multiple NCCL communicators per device by implicitly ordering NCCL operations using the host program order. Disabled by default, set NCCL_LAUNCH_ORDER_IMPLICIT=1 to enable.
-
Added a complementary mechanism to detect host threads racing to launch to the same device. Enabled by default, set NCCL_LAUNCH_RACE_FATAL=0 to disable.
-
Significantly accelerated the PAT algorithm by separating the computation and execution of PAT steps on different warps.
-
Added support for setting QoS per communicator, using a new traffic class config to allow the application to select a particular traffic class for a given communicator. For the IB/RoCE plugin, existing config variables such as NCCL_IB_SL and NCCL_IB_TC take precedence.
-
Added support for enabling GPU Direct RDMA specifically on C2C platforms. Disabled by default, set NCCL_NET_GDR_C2C=1 to enable.
-
Keep user buffer registration enabled as much as possible, disabling it only when a communicator has more than one rank per node on any node.
-
Report operation counts in RAS separately for each collective type and provide details about missing communicator ranks.
-
Added support for timestamps to NCCL diagnostic messages. This is enabled by default for WARN messages, use NCCL_DEBUG_TIMESTAMP_LEVELS to specify the levels that should include a timestamp. Use NCCL_DEBUG_TIMESTAMP_FORMAT to adjust the format of the timestamps.
-
Reduced the memory usage with NVLink SHARP (NVLS).
-
Improved algorithm/protocol selection on recent Intel CPUs such as Emerald Rapids and Sapphire Rapids.
-
Improved channel scheduling when mixing LL and Simple operations.
-
Added support for comment lines (starting with #) in the nccl.conf file.
-
Make user buffer registration problems print an INFO instead of a WARN.
Fixed Issues
The following issues have been resolved in NCCL 2.26.2:
-
Fixed a potential hang during connection setup when multiple communicators share resources.
-
Fixed a performance regression when using NCCL_CROSS_NIC=1.
-
Made the GID index detection code more resilient to avoid issues with containers.
-
Fixed a potential crash when creating a non-blocking communicator after a non-blocking collective operation on another communicator.
-
Fixed shared memory usage on recent Blackwell GPUs.
-
Fixed an issue where reloading the IB SHARP plugin could result in an error.
-
Made the failures to auto-merge NICs non-fatal, such as when mixing IB and RoCE devices.
-
Fixed a potential hang in ncclCommAbort and reduced the abort time by up to two orders of magnitude.
-
Fixed a crash when libnccl.so was dynamically unloaded.
-
Fixed a hang if the network plugin returned an error.
-
Fixed a hang on heterogeneous architectures by harmonizing tuning choices.
-
Fixed a potential crash in case of a failed communicator initialization or termination.
-
Fixed a potential bug during a group launch of multiple communicators.
-
Fixed a bug where, under rare circumstances, certain variables specified in the config file could be ignored.
Updating the GPG Repository Key
To best ensure the security and reliability of our RPM and Debian package repositories, NVIDIA is updating and rotating the signing keys used by apt, dnf/yum, and zypper package managers beginning on April 27, 2022. Failure to update your repository signing keys will result in package management errors when attempting to access or install NCCL packages. To ensure continued access to the latest NCCL release, please follow the updated NCCL installation guide.