IBM Z and the Open Neural Network Exchange

Nearly everyone recognizes the profound opportunity to bring new insights and better decisions to business workloads using AI and analytics. Enabling AI on IBM Z and LinuxONE is a key focus for IBM, allowing clients to have a reliable, secured, and high performing environment for delivering critical business insights using Machine Learning and Deep Learning applications.

However, with this opportunity there are also challenges, especially those around deploying AI in a production environment. The use of AI in critical business workloads is a growing space, and as with other new technologies, the path to production can be challenging. Key challenges include the need to deploy data science assets without sacrificing production qualities of service (i.e., meet response time goals) in a consistent, repeatable manner.

That is where the Open Neural Network Exchange (ONNX) comes in. ONNX is an open-source format used to represent machine learning models and is one of the key ecosystem technologies enabling a “Build and Train Anywhere, Deploy on IBM Z” strategy. ONNX helps establish a streamlined path to take a project from inception to production. Models represented in a standard ONNX format can then be implemented by an ONNX backend (i.e., runtime or model compiler), such as on IBM Z.

This journey to production starts with the data scientist, who may use a preferred set of tools to understand a business problem and analyze data. When that data scientist creates and trains a model, they build assets that ultimately need to be deployed in production. Often, however, the deployment platform and production requirements aren’t considered heavily in these early stages. This is where utilizing ONNX in a deployment strategy really shines. Many of the most popular libraries and frameworks, including PyTorch and TensorFlow, support the ability to export or convert a trained model to an ONNX format.

Once a model has an ONNX representation, it can be deployed to run on any platform with an ONNX runtime. This provides several key benefits: the model is now portable, with no runtime dependencies on the libraries or framework it was trained on. For example, an ONNX model that was originally created and trained in TensorFlow can be served without the TensorFlow runtime. Additionally, ONNX allows vendors to create high performing model backends that can optimize and accelerate the model for a specific architecture.

For IBM Z and the mission critical workloads it typically hosts, this combination of portability and optimization makes IBM Z an optimal environment for deploying models. One key example of the use of ONNX is in Watson Machine Learning for z/OS (WMLz), which incorporates an ONNX model compiler technology based on the ONNX-MLIR project. The ONNX model compiler feature of WMLz is focused on deep learning models and produces an executable optimized to run on IBM Z. WMLz allows the user to easily deploy this compiled ONNX model for model serving.

As IBM Z continues to innovate in enterprise AI, ONNX is a key part of IBM’s AI strategy. It allows IBM to build a deployment strategy optimized for the IBM Z architecture, while staying closely aligned with the broader ecosystem.

In August, you may have read that IBM previewed Telum, the next generation IBM Z processor. IBM is now examining opportunities to exploit the on-chip AI accelerator with the ONNX model compiler.

ONNX is part of the Linux Foundation and has widespread support from numerous key vendors that recognize the value it delivers. IBM is an early adopter of ONNX and contributes upstream to the ONNX project.

Be on the lookout for additional updates on how you can leverage ONNX as part of your IBM Z AI story!

Information taken from the article: IBM Z and the Open Neural Network Exchange