Embedded Internship

Graduation Thesis

1. Integrate Glow IR into TOSA and MLIR ecosystems

Glow is an open-source, graph-based compiler for neural network workloads, designed to optimize machine learning models across diverse hardware backends. TOSA (Tensor Operator Set Architecture) is an increasingly popular MLIR dialect that standardizes tensor-based operations to facilitate portability and compiler optimizations. By lowering Glow IR into TOSA MLIR Dialect, you will learn how to bridge two different compiler frameworks while reaping the benefits of the MLIR ecosystem, such as modular design, extensible passes, and a rich set of transformation tools. This work highlights how bridging Glow IR into TOSA and MLIR expands compiler flexibility, enabling modern ML deployment pipelines to share a common, robust infrastructure—one that combines TOSA’s standardized tensor operations with MLIR’s modular passes and Glow’s existing optimizations.

2. Add support for Rule A25-4-1 to AutoCheck by using Large Language Model

AutoCheck is a tool for static analysis that targets C++ code in critical software by checking AUTOSAR. The AUTOSAR standard for writing C++ code plays a pivotal role in the automotive industry, where software safety and reliability are paramount. This standard specifies numerous rules, including A25-4-1, which requires that predicates used in associative containers, STL sorting, and related algorithms fulfil the conditions of a strict weak ordering (ant-reflexivity, asymmetry, and transitivity). Since these conditions are commonly categorized as “non-automatable” rules, traditional static analysis tools often struggle to fully address them. In this thesis, you will combine the Clang library for static analysis with an LLM, to partially automate the verification of strict weak ordering conditions. This project showcases how combining static analysis with large language models can address “non-automatable” programming rules. The approach can potentially be extended to other complex rules defined by AUTOSAR and beyond, thereby strengthening software safety and reliability within the automotive industry—and more broadly.

3. Add support for Rule A27-0-2 to AutoCheck using the Clang Library

AutoCheck is a tool for static analysis that targets C++ code in critical software by checking AUTOSAR. The AUTOSAR standard for C++ code is widely employed in the automotive industry, where software safety and reliability are critical. It defines numerous rules that must be followed when writing C++ code, and static analysis tools can help detect violations early in the development cycle. Rule A27-0-2 states:

“A C-style string shall guarantee sufficient space for data and the null terminator.”

This rule ensures that whenever a C-style string is initialized or copied, there is enough allocated memory to include the terminating character. Failing to do so could result in unexpected or undefined behaviour. These situations can be identified by analysing the program’s abstract syntax tree (AST), which the Clang library conveniently provides.

By implementing and integrating this static analysis check into Clang, you will provide early detection of potential memory issues related to C-style strings in C++ codebases. This helps reinforce AUTOSAR compliance—particularly in safety-critical automotive software—and illustrates how compilers and static analysis can automate the enforcement of strict coding standards.

4. Implementation of the RISC-V Q extension in the LLVM project

RISC-V architecture is one of the most popular RISC architectures of today. A key feature of RISC-V is the ability to expand its capabilities using extensions. One of such Extensions is the Q extension, which allows for usage of 128-bit floating point instructions with quadruple-precision. These instructions are compliant with the IEEE 754-2008 Standard. In the moment of writing, there is no support for assembling instructions of the Q extension in the LLVM infrastructure. The main task of this thesis is to implement support, while following the established norm for the implementation of RISC-V extensions. Furthermore, it is necessary to provide an update to the LLVM’s test suite to verify the assembled instructions.

5. Define MLIR Dialect for COBOL

COBOL remains a pervasive language in enterprise computing, yet its aging toolchains and monolithic compilers make analysis, transformation, and optimization challenging. MLIR (Multi-Level Intermediate Representation) offers a flexible framework for defining custom

dialects, enabling modular compilation passes and tooling. By creating an MLIR dialect for COBOL, you will gain hands-on experience with compiler infrastructure design and unlock new possibilities for COBOL analysis, optimization, and interoperation with modern toolchains. Right now, tools like GNU COBOL translate COBOL → C → native code. That C detour works—but we inherit C’s quirks and there is an extra High Level Programming Language in the game. With our own COBOL dialect in MLIR, we can go straight from COBOL semantics to LLVM IR (or any backend), staying in the MLIR ecosystem the whole way.

By defining COBOL-specific operations, data types, you will learn how to model high-level COBOL constructs within MLIR. You will also implement a basic transformation pass (for example, constant folding) and demonstrate how to lower COBOL dialect ops into standard MLIR dialects and ultimately to executable code. Testing with sample COBOL programs will showcase the dialect’s capabilities and the entire pipeline, highlighting MLIR’s benefits in modular, extensible compiler design.