TensorRT 10.11.0
|
An engine for executing inference on a built network, with functionally unsafe features. More...
#include <NvInferRuntime.h>
Public Member Functions | |
virtual | ~ICudaEngine () noexcept=default |
Dims | getTensorShape (char const *tensorName) const noexcept |
Get shape of an input or output tensor. More... | |
DataType | getTensorDataType (char const *tensorName) const noexcept |
Determine the required data type for a buffer from its tensor name. More... | |
int32_t | getNbLayers () const noexcept |
Get the number of layers in the network. More... | |
IHostMemory * | serialize () const noexcept |
Serialize the network to a stream. More... | |
IExecutionContext * | createExecutionContext (ExecutionContextAllocationStrategy strategy=ExecutionContextAllocationStrategy::kSTATIC) noexcept |
Create an execution context and specify the strategy for allocating internal activation memory. More... | |
TensorLocation | getTensorLocation (char const *tensorName) const noexcept |
Get whether an input or output tensor must be on GPU or CPU. More... | |
bool | isShapeInferenceIO (char const *tensorName) const noexcept |
True if tensor is required as input for shape calculations or is output from shape calculations. More... | |
TensorIOMode | getTensorIOMode (char const *tensorName) const noexcept |
Determine whether a tensor is an input or output tensor. More... | |
TRT_DEPRECATED IExecutionContext * | createExecutionContextWithoutDeviceMemory () noexcept |
create an execution context without any device memory allocated More... | |
IExecutionContext * | createExecutionContext (IRuntimeConfig *runtimeConfig) noexcept |
Create an execution context with TensorRT JIT runtime config. More... | |
IRuntimeConfig * | createRuntimeConfig () noexcept |
Create a runtime config for TensorRT JIT. The caller is responsible for ownership of the returned IRuntimeConfig object. More... | |
TRT_DEPRECATED size_t | getDeviceMemorySize () const noexcept |
Return the maximum device memory required by the context over all profiles. More... | |
TRT_DEPRECATED size_t | getDeviceMemorySizeForProfile (int32_t profileIndex) const noexcept |
Return the maximum device memory required by the context for a profile. More... | |
int64_t | getDeviceMemorySizeV2 () const noexcept |
Return the maximum device memory required by the context over all profiles. More... | |
int64_t | getDeviceMemorySizeForProfileV2 (int32_t profileIndex) const noexcept |
Return the maximum device memory required by the context for a profile. More... | |
bool | isRefittable () const noexcept |
Return true if an engine can be refit. More... | |
int32_t | getTensorBytesPerComponent (char const *tensorName) const noexcept |
Return the number of bytes per component of an element, or -1 if the tensor is not vectorized or provided name does not map to an input or output tensor. More... | |
int32_t | getTensorBytesPerComponent (char const *tensorName, int32_t profileIndex) const noexcept |
Return the number of bytes per component of an element given of given profile, or -1 if the tensor is not vectorized or provided name does not map to an input or output tensor. More... | |
int32_t | getTensorComponentsPerElement (char const *tensorName) const noexcept |
Return the number of components included in one element, or -1 if tensor is not vectorized or if the provided name does not map to an input or output tensor. More... | |
int32_t | getTensorComponentsPerElement (char const *tensorName, int32_t profileIndex) const noexcept |
Return the number of components included in one element of given profile, or -1 if tensor is not vectorized or the provided name does not map to an input or output tensor. More... | |
TensorFormat | getTensorFormat (char const *tensorName) const noexcept |
Return the tensor format, or TensorFormat::kLINEAR if the provided name does not map to an input or output tensor. More... | |
TensorFormat | getTensorFormat (char const *tensorName, int32_t profileIndex) const noexcept |
Return the tensor format of given profile, or TensorFormat::kLINEAR if the provided name does not map to an input or output tensor. More... | |
char const * | getTensorFormatDesc (char const *tensorName) const noexcept |
Return the human readable description of the tensor format, or empty string if the provided name does not map to an input or output tensor. More... | |
char const * | getTensorFormatDesc (char const *tensorName, int32_t profileIndex) const noexcept |
Return the human readable description of the tensor format of given profile, or empty string if the provided name does not map to an input or output tensor. More... | |
int32_t | getTensorVectorizedDim (char const *tensorName) const noexcept |
Return the dimension index that the buffer is vectorized, or -1 if the provided name does not map to an input or output tensor. More... | |
int32_t | getTensorVectorizedDim (char const *tensorName, int32_t profileIndex) const noexcept |
Return the dimension index that the buffer is vectorized of given profile, or -1 if the provided name does not map to an input or output tensor. More... | |
char const * | getName () const noexcept |
Returns the name of the network associated with the engine. More... | |
int32_t | getNbOptimizationProfiles () const noexcept |
Get the number of optimization profiles defined for this engine. More... | |
Dims | getProfileShape (char const *tensorName, int32_t profileIndex, OptProfileSelector select) const noexcept |
Get the minimum / optimum / maximum dimensions for an input tensor given its name under an optimization profile. More... | |
TRT_DEPRECATED int32_t const * | getProfileTensorValues (char const *tensorName, int32_t profileIndex, OptProfileSelector select) const noexcept |
Get the minimum / optimum / maximum values (not dimensions) for an input tensor given its name under an optimization profile. These correspond to the values set using IOptimizationProfile::setShapeValues when the engine was built. More... | |
EngineCapability | getEngineCapability () const noexcept |
Determine what execution capability this engine has. More... | |
void | setErrorRecorder (IErrorRecorder *recorder) noexcept |
Set the ErrorRecorder for this interface. More... | |
IErrorRecorder * | getErrorRecorder () const noexcept |
Get the ErrorRecorder assigned to this interface. More... | |
TRT_DEPRECATED bool | hasImplicitBatchDimension () const noexcept |
Query whether the engine was built with an implicit batch dimension. More... | |
TacticSources | getTacticSources () const noexcept |
return the tactic sources required by this engine. More... | |
ProfilingVerbosity | getProfilingVerbosity () const noexcept |
Return the ProfilingVerbosity the builder config was set to when the engine was built. More... | |
IEngineInspector * | createEngineInspector () const noexcept |
Create a new engine inspector which prints the layer information in an engine or an execution context. More... | |
int32_t | getNbIOTensors () const noexcept |
Return number of IO tensors. More... | |
char const * | getIOTensorName (int32_t index) const noexcept |
Return name of an IO tensor. More... | |
HardwareCompatibilityLevel | getHardwareCompatibilityLevel () const noexcept |
Return the hardware compatibility level of this engine. More... | |
int32_t | getNbAuxStreams () const noexcept |
Return the number of auxiliary streams used by this engine. More... | |
ISerializationConfig * | createSerializationConfig () noexcept |
Create a serialization configuration object. More... | |
IHostMemory * | serializeWithConfig (ISerializationConfig &config) const noexcept |
Serialize the network to a stream with the provided SerializationConfig. More... | |
TRT_DEPRECATED bool | setWeightStreamingBudget (int64_t gpuMemoryBudget) noexcept |
Limit the maximum amount of GPU memory usable for network weights in bytes. More... | |
TRT_DEPRECATED int64_t | getWeightStreamingBudget () const noexcept |
Returns the current weight streaming device memory budget in bytes. More... | |
TRT_DEPRECATED int64_t | getMinimumWeightStreamingBudget () const noexcept |
The minimum number of bytes of GPU memory required by network weights for successful weight streaming. More... | |
int64_t | getStreamableWeightsSize () const noexcept |
Get the total size in bytes of all streamable weights. More... | |
bool | setWeightStreamingBudgetV2 (int64_t gpuMemoryBudget) noexcept |
Limit the maximum amount of GPU memory usable for network weights in bytes. More... | |
int64_t | getWeightStreamingBudgetV2 () const noexcept |
Returns the current weight streaming device memory budget in bytes. More... | |
int64_t | getWeightStreamingAutomaticBudget () const noexcept |
TensorRT automatically determines a device memory budget for the model to run. The budget is close to the current free memory size, leaving some space for other memory needs in the user's application. If the budget exceeds the size obtained from getStreamableWeightsSize(), it is capped to that size, effectively disabling weight streaming. Since TensorRT lacks information about the user's allocations, the remaining memory size might be larger than required, leading to wasted memory, or smaller than required, causing an out-of-memory error. For optimal memory allocation, it is recommended to manually calculate and set the budget. More... | |
int64_t | getWeightStreamingScratchMemorySize () const noexcept |
Returns the size of the scratch memory required by the current weight streaming budget. More... | |
bool | isDebugTensor (char const *name) const noexcept |
Check if a tensor is marked as a debug tensor. More... | |
int64_t const * | getProfileTensorValuesV2 (char const *tensorName, int32_t profileIndex, OptProfileSelector select) const noexcept |
Get the minimum / optimum / maximum values (not dimensions) for an input tensor given its name under an optimization profile. These correspond to the values set using IOptimizationProfile::setShapeValuesV2 when the engine was built. More... | |
Protected Attributes | |
apiv::VCudaEngine * | mImpl |
Additional Inherited Members | |
![]() | |
INoCopy ()=default | |
virtual | ~INoCopy ()=default |
INoCopy (INoCopy const &other)=delete | |
INoCopy & | operator= (INoCopy const &other)=delete |
INoCopy (INoCopy &&other)=delete | |
INoCopy & | operator= (INoCopy &&other)=delete |
An engine for executing inference on a built network, with functionally unsafe features.
|
virtualdefaultnoexcept |
|
inlinenoexcept |
Create a new engine inspector which prints the layer information in an engine or an execution context.
|
inlinenoexcept |
Create an execution context and specify the strategy for allocating internal activation memory.
The default value for the allocation strategy is ExecutionContextAllocationStrategy::kSTATIC, which means the context will pre-allocate a block of device memory that is sufficient for all profiles. The newly created execution context will be assigned optimization profile 0. If an error recorder has been set for the engine, it will also be passed to the execution context.
|
inlinenoexcept |
Create an execution context with TensorRT JIT runtime config.
runtimeConfig | The runtime config for TensorRT JIT. |
|
inlinenoexcept |
create an execution context without any device memory allocated
The memory for execution of this device context must be supplied by the application.
|
inlinenoexcept |
Create a runtime config for TensorRT JIT. The caller is responsible for ownership of the returned IRuntimeConfig object.
|
inlinenoexcept |
Create a serialization configuration object.
|
inlinenoexcept |
Return the maximum device memory required by the context over all profiles.
|
inlinenoexcept |
Return the maximum device memory required by the context for a profile.
|
inlinenoexcept |
Return the maximum device memory required by the context for a profile.
This API is stateful, so its call returns different values based on the following calls:
|
inlinenoexcept |
Return the maximum device memory required by the context over all profiles.
This API is stateful, so its call returns different values based on the following calls:
|
inlinenoexcept |
Determine what execution capability this engine has.
If the engine has EngineCapability::kSTANDARD, then all engine functionality is valid. If the engine has EngineCapability::kSAFETY, then only the functionality in safe engine is valid. If the engine has EngineCapability::kDLA_STANDALONE, then only serialize, destroy, and const-accessor functions are valid.
|
inlinenoexcept |
Get the ErrorRecorder assigned to this interface.
Retrieves the assigned error recorder object for the given class. A nullptr will be returned if an error handler has not been set.
|
inlinenoexcept |
Return the hardware compatibility level of this engine.
|
inlinenoexcept |
Return name of an IO tensor.
index | value between 0 and getNbIOTensors()-1 |
|
inlinenoexcept |
The minimum number of bytes of GPU memory required by network weights for successful weight streaming.
This is a positive integer for engines with streamable weights because a staging buffer on the GPU is required to temporarily hold the streamed weights. The size of the staging buffer is determined by TensorRT and must be at least as large as the size of the largest streamable weight in the network.
|
inlinenoexcept |
Returns the name of the network associated with the engine.
The name is set during network creation and is retrieved after building or deserialization.
|
inlinenoexcept |
Return the number of auxiliary streams used by this engine.
This number will be less than or equal to the maximum allowed number of auxiliary streams set by IBuilderConfig::setMaxAuxStreams() API call when the engine was built.
|
inlinenoexcept |
Return number of IO tensors.
It is the number of input and output tensors for the network from which the engine was built. The names of the IO tensors can be discovered by calling getIOTensorName(i) for i in 0 to getNbIOTensors()-1.
|
inlinenoexcept |
Get the number of layers in the network.
The number of layers in the network is not necessarily the number in the original network definition, as layers may be combined or eliminated as the engine is optimized. This value can be useful when building per-layer tables, such as when aggregating profiling data over a number of executions.
|
inlinenoexcept |
Get the number of optimization profiles defined for this engine.
|
inlinenoexcept |
Get the minimum / optimum / maximum dimensions for an input tensor given its name under an optimization profile.
tensorName | The name of an input tensor. |
profileIndex | The profile index, which must be between 0 and getNbOptimizationProfiles()-1. |
select | Whether to query the minimum, optimum, or maximum dimensions for this input tensor. |
|
inlinenoexcept |
Get the minimum / optimum / maximum values (not dimensions) for an input tensor given its name under an optimization profile. These correspond to the values set using IOptimizationProfile::setShapeValues when the engine was built.
tensorName | The name of an input tensor. |
profileIndex | The profile index, which must be between 0 and getNbOptimizationProfiles()-1. |
select | Whether to query the minimum, optimum, or maximum values for this input tensor. |
|
inlinenoexcept |
Get the minimum / optimum / maximum values (not dimensions) for an input tensor given its name under an optimization profile. These correspond to the values set using IOptimizationProfile::setShapeValuesV2 when the engine was built.
tensorName | The name of an input tensor. |
profileIndex | The profile index, which must be between 0 and getNbOptimizationProfiles()-1. |
select | Whether to query the minimum, optimum, or maximum values for this input tensor. |
|
inlinenoexcept |
Return the ProfilingVerbosity the builder config was set to when the engine was built.
|
inlinenoexcept |
Get the total size in bytes of all streamable weights.
The set of streamable weights is a subset of all network weights. The total size may exceed free GPU memory.
|
inlinenoexcept |
return the tactic sources required by this engine.
The value returned is equal to zero or more tactics sources set at build time via setTacticSources() in IBuilderConfig. Sources set by the latter but not returned by ICudaEngine::getTacticSources do not reduce overall engine execution time, and can be removed from future builds to reduce build time.
|
inlinenoexcept |
Return the number of bytes per component of an element, or -1 if the tensor is not vectorized or provided name does not map to an input or output tensor.
The vector component size is returned if getTensorVectorizedDim(tensorName) != -1.
tensorName | The name of an input or output tensor. |
|
inlinenoexcept |
Return the number of bytes per component of an element given of given profile, or -1 if the tensor is not vectorized or provided name does not map to an input or output tensor.
The vector component size is returned if getTensorVectorizedDim(tensorName, profileIndex) != -1.
tensorName | The name of an input or output tensor. |
profileIndex | The profile index to query |
|
inlinenoexcept |
Return the number of components included in one element, or -1 if tensor is not vectorized or if the provided name does not map to an input or output tensor.
The number of elements in the vectors is returned if getTensorVectorizedDim(tensorName) != -1.
tensorName | The name of an input or output tensor. |
|
inlinenoexcept |
Return the number of components included in one element of given profile, or -1 if tensor is not vectorized or the provided name does not map to an input or output tensor.
The number of elements in the vectors is returned if getTensorVectorizedDim(tensorName, profileIndex) != -1.
tensorName | The name of an input or output tensor. |
profileIndex | The profile index to query |
|
inlinenoexcept |
Determine the required data type for a buffer from its tensor name.
tensorName | The name of an input or output tensor. |
|
inlinenoexcept |
Return the tensor format, or TensorFormat::kLINEAR if the provided name does not map to an input or output tensor.
|
inlinenoexcept |
Return the tensor format of given profile, or TensorFormat::kLINEAR if the provided name does not map to an input or output tensor.
tensorName | The name of an input or output tensor. |
profileIndex | The profile index to query the format for. |
|
inlinenoexcept |
Return the human readable description of the tensor format, or empty string if the provided name does not map to an input or output tensor.
The description includes the order, vectorization, data type, and strides. Examples are shown as follows: Example 1: kCHW + FP32 "Row-major linear FP32 format" Example 2: kCHW2 + FP16 "Two-wide channel vectorized row-major FP16 format" Example 3: kHWC8 + FP16 + Line Stride = 32 "Channel major FP16 format where C % 8 == 0 and H Stride % 32 == 0"
tensorName | The name of an input or output tensor. |
|
inlinenoexcept |
Return the human readable description of the tensor format of given profile, or empty string if the provided name does not map to an input or output tensor.
The description includes the order, vectorization, data type, and strides. Examples are shown as follows: Example 1: kCHW + FP32 "Row-major linear FP32 format" Example 2: kCHW2 + FP16 "Two-wide channel vectorized row-major FP16 format" Example 3: kHWC8 + FP16 + Line Stride = 32 "Channel major FP16 format where C % 8 == 0 and H Stride % 32 == 0"
tensorName | The name of an input or output tensor. |
profileIndex | The profile index to query the format for. |
|
inlinenoexcept |
Determine whether a tensor is an input or output tensor.
tensorName | The name of an input or output tensor. |
|
inlinenoexcept |
Get whether an input or output tensor must be on GPU or CPU.
tensorName | The name of an input or output tensor. |
The location is established at build time. E.g. shape tensors inputs are typically required to be on the CPU.
|
inlinenoexcept |
Get shape of an input or output tensor.
tensorName | The name of an input or output tensor. |
|
inlinenoexcept |
Return the dimension index that the buffer is vectorized, or -1 if the provided name does not map to an input or output tensor.
Specifically -1 is returned if scalars per vector is 1.
tensorName | The name of an input or output tensor. |
|
inlinenoexcept |
Return the dimension index that the buffer is vectorized of given profile, or -1 if the provided name does not map to an input or output tensor.
Specifically -1 is returned if scalars per vector is 1.
tensorName | The name of an input. |
profileIndex | The profile index to query the format for. |
|
inlinenoexcept |
TensorRT automatically determines a device memory budget for the model to run. The budget is close to the current free memory size, leaving some space for other memory needs in the user's application. If the budget exceeds the size obtained from getStreamableWeightsSize(), it is capped to that size, effectively disabling weight streaming. Since TensorRT lacks information about the user's allocations, the remaining memory size might be larger than required, leading to wasted memory, or smaller than required, causing an out-of-memory error. For optimal memory allocation, it is recommended to manually calculate and set the budget.
|
inlinenoexcept |
Returns the current weight streaming device memory budget in bytes.
|
inlinenoexcept |
Returns the current weight streaming device memory budget in bytes.
|
inlinenoexcept |
Returns the size of the scratch memory required by the current weight streaming budget.
Weight streaming requires small amounts of scratch memory on the GPU to stage CPU weights right before execution. This value is typically much smaller than the total streamable weights size. Each IExecutionContext will then allocate this additional memory or the user can provide the additional memory through getDeviceMemorySizeV2() and IExecutionContext::setDeviceMemoryV2().
The return value of this call depends on
|
inlinenoexcept |
Query whether the engine was built with an implicit batch dimension.
|
inlinenoexcept |
Check if a tensor is marked as a debug tensor.
Determine whether the given name corresponds to a debug tensor.
|
inlinenoexcept |
Return true if an engine can be refit.
|
inlinenoexcept |
True if tensor is required as input for shape calculations or is output from shape calculations.
Return true for either of the following conditions:
For example, if a network uses an input tensor "foo" as an addend to an IElementWiseLayer that computes the "reshape dimensions" for IShuffleLayer, then isShapeInferenceIO("foo") == true. If the network copies said input tensor "foo" to an output "bar", then isShapeInferenceIO("bar") == true and IExecutionContext::inferShapes() will write to "bar".
|
inlinenoexcept |
Serialize the network to a stream.
The network may be deserialized with IRuntime::deserializeCudaEngine().
|
inlinenoexcept |
Serialize the network to a stream with the provided SerializationConfig.
The network may be deserialized with IRuntime::deserializeCudaEngine(). Serializing plan file with SerializationFlag::kEXCLUDE_WEIGHTS requires building the engine with kREFIT, kREFIT_IDENTICAL or kREFIT_INDIVIDUAL.
|
inlinenoexcept |
Set the ErrorRecorder for this interface.
Assigns the ErrorRecorder to this interface. The ErrorRecorder will track all errors during execution. This function will call incRefCount of the registered ErrorRecorder at least once. Setting recorder to nullptr unregisters the recorder with the interface, resulting in a call to decRefCount if a recorder has been registered.
If an error recorder is not set, messages will be sent to the global log stream.
recorder | The error recorder to register with this interface. |
|
inlinenoexcept |
Limit the maximum amount of GPU memory usable for network weights in bytes.
gpuMemoryBudget | This parameter may take on 3 types of values: -1: Allows TensorRT to choose the budget according to the streamable weights size. Free CUDA memory will be queried at createExecutionContext() and accordingly:
|
By setting a weight limit, users can expect a GPU memory usage reduction of (total bytes for network weights) - gpuMemoryBudget bytes. Maximum memory savings occur when gpuMemoryBudget is set to getMinimumWeightStreamingBudget(). Creating additional IExecutionContexts will increase memory usage by O(getMinimumStreamingBudget()).
Streaming larger amounts of memory will likely result in lower performance except in some boundary cases where streaming weights allows the user to run larger batch sizes. The higher throughput offsets the increased latency in these cases. Tuning the value of the memory limit is recommended for best performance.
|
inlinenoexcept |
Limit the maximum amount of GPU memory usable for network weights in bytes.
gpuMemoryBudget | This parameter must be a non-negative value. 0: Only small amounts of scratch memory will required to run the model. >= getStreamableWeightsSize (default): Disables weight streaming. The execution may fail if the network is too large for GPU memory. |
By setting a weight limit, users can expect a GPU memory usage reduction on the order of (total bytes for network weights) - gpuMemoryBudget bytes. Maximum memory savings occur when gpuMemoryBudget is set to 0. Each IExecutionContext will require getWeightStreamingScratchMemorySize() bytes of additional device memory if the engine is streaming its weights (budget < getStreamableWeightsSize()).
Streaming larger amounts of memory will likely result in lower performance except in some boundary cases where streaming weights allows the user to run larger batch sizes. The higher throughput offsets the increased latency in these cases. Tuning the value of the memory limit is recommended for best performance.
|
protected |
Copyright © 2024 NVIDIA Corporation
Privacy Policy |
Manage My Privacy |
Do Not Sell or Share My Data |
Terms of Service |
Accessibility |
Corporate Policies |
Product Security |
Contact