ONNX (Open Neural Network Exchange) is an open-source framework that plays a pivotal role in the field of Machine Learning Operations (MLOps) by providing a standardized format for representing and sharing deep learning models across different frameworks, libraries, and platforms. It acts as a bridge between various deep learning frameworks, making it easier to build, train, and deploy models seamlessly. ONNX is designed to enhance interoperability, speed up development, and promote innovation in the AI and machine learning domains.
Key Features of ONNX:
Interoperability: ONNX allows models to be easily transferred between popular deep learning frameworks like TensorFlow, PyTorch, and more. This interoperability is crucial for MLOps as it enables teams to choose the best tools for specific tasks.
Optimized Inference: ONNX Runtime, a runtime engine that supports the execution of ONNX models, is highly optimized for inference tasks. It ensures that models can be executed efficiently in production environments.
Hardware Acceleration: ONNX can take advantage of hardware accelerators like GPUs and TPUs, making it suitable for deploying models in high-performance settings.
Cross-Platform Compatibility: ONNX models can be deployed on various platforms, including edge devices, cloud servers, and even web browsers, making it versatile for different use cases.
ONNX in the Browser with WebGL
Now, let's focus on the specific topic for this week's session: Model Runtime on the Browser with WebGL using ONNX. This is an exciting development in the world of AI and MLOps because it allows machine learning models to be executed directly within web browsers, offering several advantages:
1. Low Latency Inference: ONNX models can be run in the browser, reducing the need for constant communication with remote servers. This results in lower latency and faster model execution, crucial for real-time applications like gaming, interactive websites, and more.
2. Privacy and Data Security: By running models locally in the browser, sensitive data can be kept on the client-side, enhancing privacy and data security. This is particularly important for applications involving personal or confidential information.
3. Offline Availability: ONNX models deployed in the browser remain accessible even without an internet connection. This is beneficial for applications that need to work in offline or intermittent connectivity scenarios.
4. Cross-Platform Compatibility: WebGL, a JavaScript API for rendering interactive 2D and 3D graphics, allows for the execution of ONNX models on a wide range of devices and platforms, including desktops, mobile phones, and VR headsets.
5. Web-Based AI: The combination of ONNX and WebGL enables the development of web-based AI applications, such as interactive demos, educational tools, and games that incorporate machine learning capabilities.
Internally, [torch.onnx.export()](https://pytorch.org/docs/stable/onnx.html#torch.onnx.export) requires a [torch.jit.ScriptModule](https://pytorch.org/docs/stable/generated/torch.jit.ScriptModule.html#torch.jit.ScriptModule) rather than a [torch.nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module). If the passed-in model is not already a ScriptModule, export() will use tracing to convert it to one:
Tracing: If torch.onnx.export() is called with a Module that is not already a ScriptModule, it first does the equivalent of [torch.jit.trace()](https://pytorch.org/docs/stable/generated/torch.jit.trace.html#torch.jit.trace), which executes the model once with the given args and records all operations that happen during that execution. This means that if your model is dynamic, e.g., changes behavior depending on input data, the exported model will not capture this dynamic behavior. We recommend examining the exported model and making sure the operators look reasonable. Tracing will unroll loops and if statements, exporting a static graph that is exactly the same as the traced run. If you want to export your model with dynamic control flow, you will need to use scripting.
Scripting: Compiling a model via scripting preserves dynamic control flow and is valid for inputs of different sizes. To use scripting:Use [torch.jit.script()](https://pytorch.org/docs/stable/generated/torch.jit.script.html#torch.jit.script) to produce a ScriptModule.Call torch.onnx.export() with the ScriptModule as the model. The args are still required, but they will be used internally only to produce example outputs, so that the types and shapes of the outputs can be captured. No tracing will be performed.
An OpSet is essentially a collection or set of operators that are supported and defined by a specific version of the ONNX specification. It defines which operators are available and how they behave. In other words, an OpSet is a versioned set of operator definitions and rules.
Verifying ONNX
Test with Random Input
Image Input
ONNX on Browser
page.tsx
Let’s break it down to understand what is happening
Loading the ONNX Model
executionProviders
WebAssembly backend
ONNX Runtime Web currently support all operators in ai.onnx and ai.onnx.ml.
WebGL backend
ONNX Runtime Web currently supports a subset of operators in ai.onnx operator set. See operators.md for a complete, detailed list of which ONNX operators are supported by WebGL backend.