Contents
- 1. Introduction
- 2. Programming Model
- 3. PTX Machine Model
- 4. Syntax
-
5. State Spaces, Types, and Variables
- 5.1. State Spaces
- 5.2. Types
- 5.3. Texture Sampler and Surface Types
- 5.4. Variables
- 5.5. Tensors
- 6. Instruction Operands
- 7. Abstracting the ABI
- 8. Memory Consistency Model
-
9. Instruction Set
- 9.1. Format and Semantics of Instruction Descriptions
- 9.2. PTX Instructions
- 9.3. Predicated Execution
- 9.4. Type Information for Instructions and Operands
- 9.5. Divergence of Threads in Control Constructs
- 9.6. Semantics
-
9.7. Instructions
-
9.7.1. Integer Arithmetic Instructions
- 9.7.1.1. Integer Arithmetic Instructions:
add
- 9.7.1.2. Integer Arithmetic Instructions:
sub
- 9.7.1.3. Integer Arithmetic Instructions:
mul
- 9.7.1.4. Integer Arithmetic Instructions:
mad
- 9.7.1.5. Integer Arithmetic Instructions:
mul24
- 9.7.1.6. Integer Arithmetic Instructions:
mad24
- 9.7.1.7. Integer Arithmetic Instructions:
sad
- 9.7.1.8. Integer Arithmetic Instructions:
div
- 9.7.1.9. Integer Arithmetic Instructions:
rem
- 9.7.1.10. Integer Arithmetic Instructions:
abs
- 9.7.1.11. Integer Arithmetic Instructions:
neg
- 9.7.1.12. Integer Arithmetic Instructions:
min
- 9.7.1.13. Integer Arithmetic Instructions:
max
- 9.7.1.14. Integer Arithmetic Instructions:
popc
- 9.7.1.15. Integer Arithmetic Instructions:
clz
- 9.7.1.16. Integer Arithmetic Instructions:
bfind
- 9.7.1.17. Integer Arithmetic Instructions:
fns
- 9.7.1.18. Integer Arithmetic Instructions:
brev
- 9.7.1.19. Integer Arithmetic Instructions:
bfe
- 9.7.1.20. Integer Arithmetic Instructions:
bfi
- 9.7.1.21. Integer Arithmetic Instructions:
szext
- 9.7.1.22. Integer Arithmetic Instructions:
bmsk
- 9.7.1.23. Integer Arithmetic Instructions:
dp4a
- 9.7.1.24. Integer Arithmetic Instructions:
dp2a
- 9.7.1.1. Integer Arithmetic Instructions:
-
9.7.2. Extended-Precision Integer Arithmetic Instructions
- 9.7.2.1. Extended-Precision Arithmetic Instructions:
add.cc
- 9.7.2.2. Extended-Precision Arithmetic Instructions:
addc
- 9.7.2.3. Extended-Precision Arithmetic Instructions:
sub.cc
- 9.7.2.4. Extended-Precision Arithmetic Instructions:
subc
- 9.7.2.5. Extended-Precision Arithmetic Instructions:
mad.cc
- 9.7.2.6. Extended-Precision Arithmetic Instructions:
madc
- 9.7.2.1. Extended-Precision Arithmetic Instructions:
-
9.7.3. Floating-Point Instructions
- 9.7.3.1. Floating Point Instructions:
testp
- 9.7.3.2. Floating Point Instructions:
copysign
- 9.7.3.3. Floating Point Instructions:
add
- 9.7.3.4. Floating Point Instructions:
sub
- 9.7.3.5. Floating Point Instructions:
mul
- 9.7.3.6. Floating Point Instructions:
fma
- 9.7.3.7. Floating Point Instructions:
mad
- 9.7.3.8. Floating Point Instructions:
div
- 9.7.3.9. Floating Point Instructions:
abs
- 9.7.3.10. Floating Point Instructions:
neg
- 9.7.3.11. Floating Point Instructions:
min
- 9.7.3.12. Floating Point Instructions:
max
- 9.7.3.13. Floating Point Instructions:
rcp
- 9.7.3.14. Floating Point Instructions:
rcp.approx.ftz.f64
- 9.7.3.15. Floating Point Instructions:
sqrt
- 9.7.3.16. Floating Point Instructions:
rsqrt
- 9.7.3.17. Floating Point Instructions:
rsqrt.approx.ftz.f64
- 9.7.3.18. Floating Point Instructions:
sin
- 9.7.3.19. Floating Point Instructions:
cos
- 9.7.3.20. Floating Point Instructions:
lg2
- 9.7.3.21. Floating Point Instructions:
ex2
- 9.7.3.22. Floating Point Instructions:
tanh
- 9.7.3.1. Floating Point Instructions:
-
9.7.4. Half Precision Floating-Point Instructions
- 9.7.4.1. Half Precision Floating Point Instructions:
add
- 9.7.4.2. Half Precision Floating Point Instructions:
sub
- 9.7.4.3. Half Precision Floating Point Instructions:
mul
- 9.7.4.4. Half Precision Floating Point Instructions:
fma
- 9.7.4.5. Half Precision Floating Point Instructions:
neg
- 9.7.4.6. Half Precision Floating Point Instructions:
abs
- 9.7.4.7. Half Precision Floating Point Instructions:
min
- 9.7.4.8. Half Precision Floating Point Instructions:
max
- 9.7.4.9. Half Precision Floating Point Instructions:
tanh
- 9.7.4.10. Half Precision Floating Point Instructions:
ex2
- 9.7.4.1. Half Precision Floating Point Instructions:
- 9.7.5. Mixed Precision Floating-Point Instructions
- 9.7.6. Comparison and Selection Instructions
- 9.7.7. Half Precision Comparison Instructions
-
9.7.8. Logic and Shift Instructions
- 9.7.8.1. Logic and Shift Instructions:
and
- 9.7.8.2. Logic and Shift Instructions:
or
- 9.7.8.3. Logic and Shift Instructions:
xor
- 9.7.8.4. Logic and Shift Instructions:
not
- 9.7.8.5. Logic and Shift Instructions:
cnot
- 9.7.8.6. Logic and Shift Instructions:
lop3
- 9.7.8.7. Logic and Shift Instructions:
shf
- 9.7.8.8. Logic and Shift Instructions:
shl
- 9.7.8.9. Logic and Shift Instructions:
shr
- 9.7.8.1. Logic and Shift Instructions:
-
9.7.9. Data Movement and Conversion Instructions
- 9.7.9.1. Cache Operators
- 9.7.9.2. Cache Eviction Priority Hints
- 9.7.9.3. Data Movement and Conversion Instructions:
mov
- 9.7.9.4. Data Movement and Conversion Instructions:
mov
- 9.7.9.5. Data Movement and Conversion Instructions:
shfl
(deprecated) - 9.7.9.6. Data Movement and Conversion Instructions:
shfl.sync
- 9.7.9.7. Data Movement and Conversion Instructions:
prmt
- 9.7.9.8. Data Movement and Conversion Instructions:
ld
- 9.7.9.9. Data Movement and Conversion Instructions:
ld.global.nc
- 9.7.9.10. Data Movement and Conversion Instructions:
ldu
- 9.7.9.11. Data Movement and Conversion Instructions:
st
- 9.7.9.12. Data Movement and Conversion Instructions:
st.async
- 9.7.9.13. Data Movement and Conversion Instructions:
st.bulk
- 9.7.9.14. Data Movement and Conversion Instructions:
multimem.ld_reduce
,multimem.st
,multimem.red
- 9.7.9.15. Data Movement and Conversion Instructions:
prefetch
,prefetchu
- 9.7.9.16. Data Movement and Conversion Instructions:
applypriority
- 9.7.9.17. Data Movement and Conversion Instructions:
discard
- 9.7.9.18. Data Movement and Conversion Instructions:
createpolicy
- 9.7.9.19. Data Movement and Conversion Instructions:
isspacep
- 9.7.9.20. Data Movement and Conversion Instructions:
cvta
- 9.7.9.21. Data Movement and Conversion Instructions:
cvt
- 9.7.9.22. Data Movement and Conversion Instructions:
cvt.pack
- 9.7.9.23. Data Movement and Conversion Instructions:
mapa
- 9.7.9.24. Data Movement and Conversion Instructions:
getctarank
-
9.7.9.25. Data Movement and Conversion Instructions: Asynchronous copy
- 9.7.9.25.1. Completion Mechanisms for Asynchronous Copy Operations
- 9.7.9.25.2. Async Proxy
- 9.7.9.25.3. Data Movement and Conversion Instructions: Non-bulk copy
- 9.7.9.25.4. Data Movement and Conversion Instructions: Bulk copy
-
9.7.9.25.5. Data Movement and Conversion Instructions: Tensor copy
- 9.7.9.25.5.1. Restriction on Tensor Copy instructions
- 9.7.9.25.5.2. Data Movement and Conversion Instructions:
cp.async.bulk.tensor
- 9.7.9.25.5.3. Data Movement and Conversion Instructions:
cp.reduce.async.bulk.tensor
- 9.7.9.25.5.4. Data Movement and Conversion Instructions:
cp.async.bulk.prefetch.tensor
- 9.7.9.25.6. Data Movement and Conversion Instructions: Bulk and Tensor copy completion instructions
- 9.7.9.26. Data Movement and Conversion Instructions:
tensormap.replace
- 9.7.10. Texture Instructions
- 9.7.11. Surface Instructions
- 9.7.12. Control Flow Instructions
-
9.7.13. Parallel Synchronization and Communication Instructions
- 9.7.13.1. Parallel Synchronization and Communication Instructions:
bar
,barrier
- 9.7.13.2. Parallel Synchronization and Communication Instructions:
bar.warp.sync
- 9.7.13.3. Parallel Synchronization and Communication Instructions:
barrier.cluster
- 9.7.13.4. Parallel Synchronization and Communication Instructions:
membar
/fence
- 9.7.13.5. Parallel Synchronization and Communication Instructions:
atom
- 9.7.13.6. Parallel Synchronization and Communication Instructions:
red
- 9.7.13.7. Parallel Synchronization and Communication Instructions:
red.async
- 9.7.13.8. Parallel Synchronization and Communication Instructions:
vote
(deprecated) - 9.7.13.9. Parallel Synchronization and Communication Instructions:
vote.sync
- 9.7.13.10. Parallel Synchronization and Communication Instructions:
match.sync
- 9.7.13.11. Parallel Synchronization and Communication Instructions:
activemask
- 9.7.13.12. Parallel Synchronization and Communication Instructions:
redux.sync
- 9.7.13.13. Parallel Synchronization and Communication Instructions:
griddepcontrol
- 9.7.13.14. Parallel Synchronization and Communication Instructions:
elect.sync
-
9.7.13.15. Parallel Synchronization and Communication Instructions:
mbarrier
- 9.7.13.15.1. Size and alignment of mbarrier object
- 9.7.13.15.2. Contents of the mbarrier object
- 9.7.13.15.3. Lifecycle of the mbarrier object
- 9.7.13.15.4. Phase of the mbarrier object
- 9.7.13.15.5. Tracking asynchronous operations by the mbarrier object
- 9.7.13.15.6. Phase Completion of the mbarrier object
- 9.7.13.15.7. Arrive-on operation on mbarrier object
- 9.7.13.15.8. mbarrier support with shared memory
- 9.7.13.15.9. Parallel Synchronization and Communication Instructions:
mbarrier.init
- 9.7.13.15.10. Parallel Synchronization and Communication Instructions:
mbarrier.inval
- 9.7.13.15.11. Parallel Synchronization and Communication Instructions:
mbarrier.expect_tx
- 9.7.13.15.12. Parallel Synchronization and Communication Instructions:
mbarrier.complete_tx
- 9.7.13.15.13. Parallel Synchronization and Communication Instructions:
mbarrier.arrive
- 9.7.13.15.14. Parallel Synchronization and Communication Instructions:
mbarrier.arrive_drop
- 9.7.13.15.15. Parallel Synchronization and Communication Instructions:
cp.async.mbarrier.arrive
- 9.7.13.15.16. Parallel Synchronization and Communication Instructions:
mbarrier.test_wait
/mbarrier.try_wait
- 9.7.13.15.17. Parallel Synchronization and Communication Instructions:
mbarrier.pending_count
- 9.7.13.16. Parallel Synchronization and Communication Instructions:
tensormap.cp_fenceproxy
- 9.7.13.17. Parallel Synchronization and Communication Instructions:
clusterlaunchcontrol.try_cancel
- 9.7.13.18. Parallel Synchronization and Communication Instructions:
clusterlaunchcontrol.query_cancel
- 9.7.13.1. Parallel Synchronization and Communication Instructions:
-
9.7.14. Warp Level Matrix Multiply-Accumulate Instructions
- 9.7.14.1. Matrix Shape
- 9.7.14.2. Matrix Data-types
- 9.7.14.3. Block Scaling
-
9.7.14.4. Matrix multiply-accumulate operation using
wmma
instructions -
9.7.14.5. Matrix multiply-accumulate operation using
mma
instruction- 9.7.14.5.1. Matrix Fragments for
mma.m8n8k4
with.f16
floating point type - 9.7.14.5.2. Matrix Fragments for
mma.m8n8k4
with.f64
floating point type - 9.7.14.5.3. Matrix Fragments for
mma.m8n8k16
- 9.7.14.5.4. Matrix Fragments for
mma.m8n8k32
- 9.7.14.5.5. Matrix Fragments for
mma.m8n8k128
- 9.7.14.5.6. Matrix Fragments for
mma.m16n8k4
- 9.7.14.5.7. Matrix Fragments for
mma.m16n8k8
- 9.7.14.5.8. Matrix Fragments for
mma.m16n8k16
with floating point type - 9.7.14.5.9. Matrix Fragments for
mma.m16n8k16
with integer type - 9.7.14.5.10. Matrix Fragments for
mma.m16n8k32
- 9.7.14.5.11. Matrix Fragments for
mma.m16n8k64
- 9.7.14.5.12. Matrix Fragments for
mma.m16n8k128
- 9.7.14.5.13. Matrix Fragments for
mma.m16n8k256
- 9.7.14.5.14. Multiply-and-Accumulate Instruction:
mma
- 9.7.14.5.15. Warp-level matrix load instruction:
ldmatrix
- 9.7.14.5.16. Warp-level matrix store instruction:
stmatrix
- 9.7.14.5.17. Warp-level matrix transpose instruction:
movmatrix
- 9.7.14.5.1. Matrix Fragments for
-
9.7.14.6. Matrix multiply-accumulate operation using
mma.sp
instruction with sparse matrix A- 9.7.14.6.1. Sparse matrix storage
-
9.7.14.6.2. Matrix fragments for multiply-accumulate operation with sparse matrix A
- 9.7.14.6.2.1. Matrix Fragments for sparse
mma.m16n8k16
with.f16
and.bf16
types - 9.7.14.6.2.2. Matrix Fragments for sparse
mma.m16n8k32
with.f16
and.bf16
types - 9.7.14.6.2.3. Matrix Fragments for sparse
mma.m16n8k16
with.tf32
floating point type - 9.7.14.6.2.4. Matrix Fragments for sparse
mma.m16n8k8
with.tf32
floating point type - 9.7.14.6.2.5. Matrix Fragments for sparse
mma.m16n8k32
with.u8
/.s8
integer type - 9.7.14.6.2.6. Matrix Fragments for sparse
mma.m16n8k64
with.u8
/.s8
/.e4m3
/.e5m2
type - 9.7.14.6.2.7. Matrix Fragments for sparse
mma.m16n8k64
with.u4
/.s4
integer type - 9.7.14.6.2.8. Matrix Fragments for sparse
mma.m16n8k128
with.u4
/.s4
integer type
- 9.7.14.6.2.1. Matrix Fragments for sparse
- 9.7.14.6.3. Multiply-and-Accumulate Instruction:
mma.sp
/mma.sp::ordered_metadata
-
9.7.15. Asynchronous Warpgroup Level Matrix Multiply-Accumulate Instructions
- 9.7.15.1. Warpgroup
- 9.7.15.2. Matrix Shape
- 9.7.15.3. Matrix Data-types
- 9.7.15.4. Async Proxy
-
9.7.15.5. Asynchronous Warpgroup Level Matrix Multiply-Accumulate Operation using
wgmma.mma_async
instruction -
9.7.15.6. Asynchronous Warpgroup Level Multiply-and-Accumulate Operation using
wgmma.mma_async.sp
instruction -
9.7.15.7. Asynchronous
wgmma
Proxy Operations
-
9.7.16. TensorCore 5th Generation Family Instructions
- 9.7.16.1. Tensor Memory
- 9.7.16.2. Matrix and Data Movement Shape
- 9.7.16.3. Major-ness supported by Strides
- 9.7.16.4. Matrix Descriptors
- 9.7.16.5. Issue Granularity
- 9.7.16.6. Memory Consistency Model for 5th generation of TensorCore operations
- 9.7.16.7. Tensor Memory Allocation and Management Instructions
- 9.7.16.8. Tensor Memory and Register Load/Store Instructions
- 9.7.16.9. Tensor Memory Data Movement Instructions
-
9.7.16.10. TensorCore 5th Generation Matrix Multiply and accumulate Operations
- 9.7.16.10.1. Transpose and Negate operations
- 9.7.16.10.2. Matrix Layout Organization
- 9.7.16.10.3. Valid Combinations of Type-Size, Major-ness and Swizzling
-
9.7.16.10.4. Packing formats of elements in Tensor and Shared memory
- 9.7.16.10.4.1. Packing format for matrix D in Tensor Memory
- 9.7.16.10.4.2. Packing format for matrix A and B
- 9.7.16.10.4.3. Packing format used for matrix A by
.kind::mxf8f6f4
in Tensor Memory - 9.7.16.10.4.4. Packing format used for matrix A and B by
.kind::mxf8f6f4
in Shared Memory - 9.7.16.10.4.5. Packing format used for matrix A by
.kind::mxf4
and.kind::mxf4nvf4
in Tensor Memory - 9.7.16.10.4.6. Packing format used for matrix A and B by
.kind::mxf4
and.kind::mxf4nvf4
in Shared Memory
-
9.7.16.10.5. Data Path Layout Organization
- 9.7.16.10.5.1. Layout A (M = 256)
- 9.7.16.10.5.2. Layout B (M = 128 + cta-group::2 + Dense A matrix)
- 9.7.16.10.5.3. Layout C (M = 128 + cta-group::2 + Sparse A matrix)
- 9.7.16.10.5.4. Layout D (M = 128 + cta-group::1)
- 9.7.16.10.5.5. Layout E (M = 64 + .ws mode)
- 9.7.16.10.5.6. Layout F (M = 64 + non .ws mode)
- 9.7.16.10.5.7. Layout G (M = 32)
- 9.7.16.10.6. Shared Memory Layout and Swizzling
-
9.7.16.10.7. Block Scaling
- 9.7.16.10.7.1. Valid combinations of scale_vectorsize with types and MMA-Kind
-
9.7.16.10.7.2. Scale Factor A ID
- 9.7.16.10.7.2.1. Layout of the Scale Factor A Matrix for scale_vec::1X/block32 with K=32/K=64
- 9.7.16.10.7.2.2. Layout of the Scale Factor A Matrix for scale_vec::2X/block32 with K=64/K=128
- 9.7.16.10.7.2.3. Layout of the Scale Factor A Matrix for scale_vec::4X/block16 with K=64/K=128
- 9.7.16.10.7.2.4. Layout of the Scale Factor A Matrix for block32 with K=96 (Semantically equivalent to scale_vec::3X)
- 9.7.16.10.7.2.5. Layout of the Scale Factor A Matrix for block16 with K=96 (Semantically equivalent to scale_vec::6X)
-
9.7.16.10.7.3. Scale Factor B ID
- 9.7.16.10.7.3.1. Layout of the Scale Factor B Matrix for scale_vec::1X/block32 with K=32/K=64
- 9.7.16.10.7.3.2. Layout of the Scale Factor B Matrix for scale_vec::2X/block32 with K=64/K=128
- 9.7.16.10.7.3.3. Layout of the Scale Factor B Matrix for scale_vec::4X/block16 with K=64/K=128
- 9.7.16.10.7.3.4. Layout of the Scale Factor B Matrix for block32 with K=96 (Semantically equivalent to scale_vec::3X)
- 9.7.16.10.7.3.5. Layout of the Scale Factor B Matrix for block16 with K=96 (Semantically equivalent to scale_vec::6X)
-
9.7.16.10.8. Sparse Matrices
- 9.7.16.10.8.1. Sparse
tcgen05.mma.sp
with.kind::tf32
- 9.7.16.10.8.2. Sparse
tcgen05.mma.sp
with.kind::f16
,.kind::f8f6f4
,.kind::mxf8f6f4
,.kind::i8
- 9.7.16.10.8.3. Sparse
tcgen05.mma.sp
with.kind::mxf4
and.kind::mxf4nvf4
-
9.7.16.10.8.4. Sparsity selector
- 9.7.16.10.8.4.1. Layout of the Sparsity Metadata Matrix for M = 64 for
.kind::f16
- 9.7.16.10.8.4.2. Layout of the Sparsity Metadata Matrix for M = 128 / M = 256 for
.kind::f16
- 9.7.16.10.8.4.3. Layout of the Sparsity Metadata Matrix for M = 64 for
.kind::tf32
- 9.7.16.10.8.4.4. Layout of the Sparsity Metadata Matrix for M = 128 / M = 256 for
.kind::tf32
- 9.7.16.10.8.4.5. Layout of the Sparsity Metadata Matrix for M = 64 for
.kind::f8f6f4
,.kind::mxf8f6f4
,.kind::i8
,.kind::mxf4
,.kind::mxf4nvf4
- 9.7.16.10.8.4.6. Layout of the Sparsity Metadata Matrix for M = 128 / M = 256 for
.kind::f8f6f4
,.kind::mxf8f6f4
,.kind::i8
,.kind::mxf4
,.kind::mxf4nvf4
- 9.7.16.10.8.4.1. Layout of the Sparsity Metadata Matrix for M = 64 for
- 9.7.16.10.8.5. Alignment restriction
- 9.7.16.10.8.1. Sparse
- 9.7.16.10.9. TensorCore 5th Generation of MMA Instructions
- 9.7.16.11. TensorCore 5th Generation Specialized Synchronization Operations
- 9.7.16.12. TensorCore 5th Generation Async Synchronization Operations
- 9.7.17. Stack Manipulation Instructions
- 9.7.18. Video Instructions
- 9.7.19. Miscellaneous Instructions
-
9.7.1. Integer Arithmetic Instructions
-
10. Special Registers
- 10.1. Special Registers:
%tid
- 10.2. Special Registers:
%ntid
- 10.3. Special Registers:
%laneid
- 10.4. Special Registers:
%warpid
- 10.5. Special Registers:
%nwarpid
- 10.6. Special Registers:
%ctaid
- 10.7. Special Registers:
%nctaid
- 10.8. Special Registers:
%smid
- 10.9. Special Registers:
%nsmid
- 10.10. Special Registers:
%gridid
- 10.11. Special Registers:
%is_explicit_cluster
- 10.12. Special Registers:
%clusterid
- 10.13. Special Registers:
%nclusterid
- 10.14. Special Registers:
%cluster_ctaid
- 10.15. Special Registers:
%cluster_nctaid
- 10.16. Special Registers:
%cluster_ctarank
- 10.17. Special Registers:
%cluster_nctarank
- 10.18. Special Registers:
%lanemask_eq
- 10.19. Special Registers:
%lanemask_le
- 10.20. Special Registers:
%lanemask_lt
- 10.21. Special Registers:
%lanemask_ge
- 10.22. Special Registers:
%lanemask_gt
- 10.23. Special Registers:
%clock
,%clock_hi
- 10.24. Special Registers:
%clock64
- 10.25. Special Registers:
%pm0
…%pm7
- 10.26. Special Registers:
%pm0_64
…%pm7_64
- 10.27. Special Registers:
%envreg<32>
- 10.28. Special Registers:
%globaltimer
,%globaltimer_lo
,%globaltimer_hi
- 10.29. Special Registers:
%reserved_smem_offset_begin
,%reserved_smem_offset_end
,%reserved_smem_offset_cap
,%reserved_smem_offset_<2>
- 10.30. Special Registers:
%total_smem_size
- 10.31. Special Registers:
%aggr_smem_size
- 10.32. Special Registers:
%dynamic_smem_size
- 10.33. Special Registers:
%current_graph_exec
- 10.1. Special Registers:
-
11. Directives
- 11.1. PTX Module Directives
- 11.2. Specifying Kernel Entry Points and Functions
- 11.3. Control Flow Directives
-
11.4. Performance-Tuning Directives
- 11.4.1. Performance-Tuning Directives:
.maxnreg
- 11.4.2. Performance-Tuning Directives:
.maxntid
- 11.4.3. Performance-Tuning Directives:
.reqntid
- 11.4.4. Performance-Tuning Directives:
.minnctapersm
- 11.4.5. Performance-Tuning Directives:
.maxnctapersm
(deprecated) - 11.4.6. Performance-Tuning Directives:
.noreturn
- 11.4.7. Performance-Tuning Directives:
.pragma
- 11.4.1. Performance-Tuning Directives:
- 11.5. Debugging Directives
- 11.6. Linking Directives
- 11.7. Cluster Dimension Directives
-
12. Descriptions of
.pragma
Strings -
13. Release Notes
- 13.1. Changes in PTX ISA Version 8.8
- 13.2. Changes in PTX ISA Version 8.7
- 13.3. Changes in PTX ISA Version 8.6
- 13.4. Changes in PTX ISA Version 8.5
- 13.5. Changes in PTX ISA Version 8.4
- 13.6. Changes in PTX ISA Version 8.3
- 13.7. Changes in PTX ISA Version 8.2
- 13.8. Changes in PTX ISA Version 8.1
- 13.9. Changes in PTX ISA Version 8.0
- 13.10. Changes in PTX ISA Version 7.8
- 13.11. Changes in PTX ISA Version 7.7
- 13.12. Changes in PTX ISA Version 7.6
- 13.13. Changes in PTX ISA Version 7.5
- 13.14. Changes in PTX ISA Version 7.4
- 13.15. Changes in PTX ISA Version 7.3
- 13.16. Changes in PTX ISA Version 7.2
- 13.17. Changes in PTX ISA Version 7.1
- 13.18. Changes in PTX ISA Version 7.0
- 13.19. Changes in PTX ISA Version 6.5
- 13.20. Changes in PTX ISA Version 6.4
- 13.21. Changes in PTX ISA Version 6.3
- 13.22. Changes in PTX ISA Version 6.2
- 13.23. Changes in PTX ISA Version 6.1
- 13.24. Changes in PTX ISA Version 6.0
- 13.25. Changes in PTX ISA Version 5.0
- 13.26. Changes in PTX ISA Version 4.3
- 13.27. Changes in PTX ISA Version 4.2
- 13.28. Changes in PTX ISA Version 4.1
- 13.29. Changes in PTX ISA Version 4.0
- 13.30. Changes in PTX ISA Version 3.2
- 13.31. Changes in PTX ISA Version 3.1
- 13.32. Changes in PTX ISA Version 3.0
- 13.33. Changes in PTX ISA Version 2.3
- 13.34. Changes in PTX ISA Version 2.2
- 13.35. Changes in PTX ISA Version 2.1
- 13.36. Changes in PTX ISA Version 2.0
- 14. Notices