Embedded Coding: 2025

Friday, June 20, 2025

A Guide to Using ST Edge AI Developer Cloud

Using ST Edge AI Developer Cloud

In a previous post, A Guide to STMicroelectronics' Edge AI Suite Tools , I provided an overview of the tools in STMicroelectronics' Edge AI Suite. In this post, we'll focus on one such tool— ST Edge AI Developer Cloud— and walk through how to use it to test machine learning models.

ST Edge AI Developer Cloud is a web-based tool that allows you to test machine learning models by remotely accessing ST boards. It enables simulation of deployment and performance testing without the need to purchase any physical boards.

This guide outlines each step required to use the Developer Cloud, with screenshots provided to show the exact process. A similar walkthrough is also available in a video on the STMicroelectronics YouTube Channel.

Walkthrough

1. Accessing Edge AI Developer Cloud

Visit the ST Edge AI Developer Cloud and click "Start Now" on the landing page.

2. Sign in or Create an Account

To use the tool, log into your myST account or create one if you haven't already.

3. Import a Model

Import a model from your device or from the ST Model Zoo. For this example, I will use the first "Hand Posture" model that appears in the Model Zoo. Once selected, click "Start" next to the imported model.

4. Choose Platform & Version

Select a version and platform to use. For this demonstration, I will use the default version, ST Edge AI Core 1.0.0, and select STM32 MPUs as the target platform.

5. Quantize the Model

Click "Launch Quantization". You may also upload a .npz file for accuracy evaluation. After quantization, click "Go Next" to move on to the Optimization stage.

6. Optimize the Model

In the Optimization section, select a .tflite file from the model selector at the top, then click "Optimize". Once the model has been optimized, click "Continue", which will appear next to the "Optimize" button.

7. Benchmark Across Boards

Click "Start Benchmark" for all available boards. This will remotely run inference on a physical board and display the inference time once complete. Afterwards, click "Go Next" above the boards.

8. View Results

Under the "Results" section, you can view metrics such as weight size, performance, and resource utilization.

The "Show details per layer" option shows the resource utilization on the selected board, and the "Show performance summary" option compares inference times across all tested boards. After reviewing the results, click "Go Next".

9. Generate Code

Based on the benchmark results, generate code tailored to the optimal board. In this example, we will select the STM32MP135F-DK board, as it showed the fastest inference time. To view the timings, refer to the "Show Performance Summary" graph from the "Results" section.

Conclusion

The ST Edge AI Developer Cloud is a powerful testing environment for optimizing AI models on ST hardware. By allowing developers to evaluate boards remotely, it streamlines the deployment process and speeds up decision-making when selecting the best platform for your machine learning applications.

Tuesday, May 13, 2025

Fix for KiCad Causing Windows Shutdown

Recently while working with KiCad 9.0.2, my Windows 11 machine kept shutting down unexpectedly. To fix this issue I had to uninstall all versions of KiCad on my computer, then reinstall the latest version. In this instance, I had 7.0, 8.0 and 9.0 installed.

Once all versions have been removed, double check your start menu for an app called "KiCad Command Prompt". If it appears, you may need to uninstall it also. Additionally, you may need to restart your computer. Afterwards, proceed with reinstalling 9.0.2.

Now that 9.0.2 runs smoothly, I will check back upon future releases to see if similar issues arise.

Friday, March 14, 2025

How to Build the TensorFlow Lite C API from Source Inside WSL

How to Build TensorFlow Lite C API from Source Inside WSL

TensorFlow Lite is a lightweight, efficient runtime for deploying machine learning models on edge devices. It's ideal for environments that are low-power and performance-critical such as embedded systems mobile devices, and microcontrollers.

Building the TensorFlow Lite C API from source inside Windows Subsystem for Linux (WSL) allows you to integrate AI inference into native C applications. This is useful when working on constrained devices, building low-level systems, or working with existing C/C++ codebases.

Step 1: Set Up WSL and Create a .wslconfig File (Optional)

To prevent Bazel crashes from memory exhaustion, increase the memory limit for WSL. First, Open the Terminal(Windows Powershell):

# On Windows (not WSL):
Create C:\Users\<yourname>\.wslconfig with the following content:

[wsl2]
memory=6GB
processors=4

To do this with Notepad:

Open the Start menu and type Notepad
Paste the above configuration text into the new file
Click File > Save As...
Set File name: .wslconfig
Set Save as type: to All Files
Save it to C:\Users\<yourname>\

Then from PowerShell:

wsl --shutdown

Step 2: Install Prerequisites

sudo apt update
sudo apt install -y build-essential clang git wget python3-pip

Step 3: Install Numpy

pip install numpy

Step 4: Install Bazelisk

wget https://github.com/bazelbuild/bazelisk/releases/download/v1.17.0/bazelisk-linux-amd64 -O bazelisk
chmod +x bazelisk
sudo mv bazelisk /usr/local/bin/bazelisk

Step 5: Set the Required Bazel Version

export USE_BAZEL_VERSION=5.3.0
echo 'export USE_BAZEL_VERSION=5.3.0' >> ~/.bashrc
source ~/.bashrc

Step 6: Clone TensorFlow and Check Out the Version

git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
git checkout v2.12.0

Step 7: Build TensorFlow Lite C API

Optional but recommended: limit RAM usage to avoid crashes.

export BAZEL_BUILD_OPTS="--local_ram_resources=2048"
cd tensorflow/tensorflow/lite/c
bazelisk build -c opt $BAZEL_BUILD_OPTS --define=flatbuffer_op_resolver=false //tensorflow/lite/c:libtensorflowlite_c.so

Step 8: Install the Library and Headers

cd ~/tensorflow
sudo cp bazel-bin/tensorflow/lite/c/libtensorflowlite_c.so /usr/local/lib/
sudo ldconfig

# Copy required top-level headers
sudo mkdir -p /usr/local/include/tflite
sudo cp tensorflow/lite/c/c_api.h /usr/local/include/tflite/

# Copy all internal TensorFlow Lite C API dependencies
sudo mkdir -p /usr/local/include/tensorflow/lite/core/c
sudo cp tensorflow/lite/core/c/c_api.h /usr/local/include/tensorflow/lite/core/c/
sudo cp tensorflow/lite/core/c/c_api_types.h /usr/local/include/tensorflow/lite/core/c/

# Copy additional headers required by the C API
sudo mkdir -p /usr/local/include/tensorflow/lite
sudo cp tensorflow/lite/builtin_ops.h /usr/local/include/tensorflow/lite/

Step 9: Verify With a Simple C Program

#include "tflite/c_api.h"
#include <stdio.h>

int main() {
    TfLiteModel* model = TfLiteModelCreateFromFile("model.tflite");
    if (!model) {
        printf("Failed to load TensorFlow Lite model\n");
        return 1;
    }
    printf("TensorFlow Lite model loaded successfully!\n");
    TfLiteModelDelete(model);
    return 0;
}

Compile it with:

gcc -o tflite_test tflite_test.c -I/usr/local/include -L/usr/local/lib -ltensorflowlite_c
./tflite_test

Conclusion

Now that you’ve built the TensorFlow Lite C API from source inside WSL, you're ready to run AI inference directly in your C applications. This setup is ideal for embedded AI applications such as digital signal processing. By building from source, you gain control when integrating with systems where Python or heavy dependencies are incompatible.

Wednesday, January 15, 2025

Previewing NVIDIA's Cosmos, a Generative AI Worldbuilding Tool

NVIDIA Cosmos is a new platform from NVIDIA that uses generative AI for digital worldbuilding. In this post, I will demonstrate some possible outputs from Cosmos using NVIDIA's simulation tools. Afterward, I will discuss some of the functions and libraries Cosmos uses.

To create your own digital worlds with Cosmos, follow this link to the Simulation Tools in NVIDIA Explore NVIDIA Explore - Simulation Tools. To start, we will select the cosmos-1.0-diffusion-7b.

The Preview Image for NVIDIA Cosmos Diffusion Model

The Preview Image for cosmos-1.0-diffusion-7b

Once we've selected cosmos-1.0-diffusion-7b, we are presented with an option to input text, an image, or a video. The default example is a robot rolling through a chemical plant, with a video as the output.

AN AI-Generated Image of a Robot Traversing a Chemical Plant

For this demonstration, I'm going to begin by inputting the following text into the input box: "A crane at a dock lifting a large cargo crate from a ship onto the dock. photorealistic" After about 60 seconds, Cosmos produces a short 5-second video as output. Here is a frame from the one it generated from my first prompt:

An AI-Generated Image of a Dock with a Crane

In this case, we used the Cosmos-1.0-diffusion-7b-Text2World model, which takes an input of up 300 words and produces an output video of 121 frames.As described in the linked documentation, it uses self-attention, cross-attention, and feedforward layers. Additionally, it uses adaptive layer normalization for denoising between each layer. Each layer is necessary and serves a unique purpose.

Starting with the self-attention layer, it is used to determine what words in the input text will be most relevant to the output image. For example, the word "crane" in our prompt is weighted higher than the word "at". While both are relevant to the output, the object of the crane is in the center of the video. Next, the cross-attention layer relates the information contained in each word and assigns it to a relevant image as a result. In our case, this is shown by the word crate and the image of a brown crate. To clarify, the word "crate" is referred to as the source, and the image is referred to as the target.

The third layer, the feedforward layer, redefines each word after the the cross-attention layer finds it relevance. For example, the crate in our example is placed on the dock in our image, because the feedforward layer related it to the phrase "onto the dock". Lastly, the adaptive layer normalization stabilizes the output, which in this case could refer to making the crane move slowly and not too jittery.

In addition to the cosmos-1.0-7b-diffusion-7b which uses the text2world model, there is also the cosmos-1.0-autoregressive-5b model.

The Preview Image for cosmos-1.0-autoregressive-5b

This model takes a picture as input and produces a 5-second video as an output. The first frame of the output video is the exact picture, and the model predicts what happens in the next 5 seconds of the scene to create the video. For this model, there are a series of 9 preselected images to choose from.

Sample Images for Video Generation

Similar to the text2world model, the autoregressive video2world model employs self-attention, cross-attention, and feedforward layers. It should be noted that while this model is referred to as video2world, it can accept text, images, or videos as input, and outputs a video from whichever input was given.

Overall, NVIDIA Cosmos is a powerful worldbuilding tool for a variety of applications including simulation software as well as game development. To learn more about the development tools NVIDIA has to offer, check out the following post: An Overview of NVIDIA's AI Tools for Developers

Wednesday, January 8, 2025

NVIDIA's Jetson Orin Nano Super Developer Kit

NVIDIA recently unveiled their new Jetson Orin Nano Super Developer Kit, a powerful computer designed for using AI on edge devices. Key features of the kit include a performance of up to 67 TOPS(Trillion Operations per Second), 102 GB/s of memory bandwidth, and a CPU frequency of up to 1.7 GHz. Other technical specifications found on the datasheet are a 6-core ARM Cortex-A78AE v8.2 64-bit CPU (arm-cortex-a78ae-product-brief), an NVIDIA Ampere GPU (NVIDIA Ampere Architecture) and 8GB of memory. The kit has a cost of $249 and sold out quickly. It is currently on backorder from retailers such as Sparkfun Electronics, and Seeed Studio.

This new Super Developer Kit is an upgraded version of the previous Jetson Orin Nano Developer Kit. NVIDIA has provided the JetPack 6.1 SDK which can be used on existing Jetson Orin Nano Developer Kits to access features from the Super developer Kit. To unlock the super performance, users can select the latest release of JetPack, Jetpack 6.1 (rev. 1), when installing the latest SD Card image for the Kit. Detailed installation instructions can be found on the following page: JetPack SDK.

The following table from NVIDIA highlights the main improvements of the Super Developer Kit, including the improved operating speed of up to 67 TOPS and memory bandwidth of 102 GB/s.

In this video, NVIDIA CEO Jensen Huang presents the Jetson Orin Nano Super Developer Kit.

If you are interested in learning about software compatible with the Jetson Orin Nano, check out this post in which I summarize the key features of NVIDIA's AI tools: An Overview of NVIDIA's AI Tools For Developers