Skip to content

Microsoft's AICI

A team at Microsoft unveiled AICI:

The Artificial Intelligence Controller Interface (AICI) lets you build Controllers that constrain and direct output of a Large Language Model (LLM) in real time. Controllers are flexible programs capable of implementing constrained decoding, dynamic editing of prompts and generated text, and coordinating execution across multiple, parallel generations. Controllers incorporate custom logic during the token-by-token decoding and maintain state during an LLM request. This allows diverse Controller strategies, from programmatic or query-based decoding to multi-agent conversations to execute efficiently in tight integration with the LLM itself.

This is such a clever idea. A couple months ago I talked about being able to control the output of an LLM in real time. This does exactly that, but better because you can write your own custom logic to control the output.

What is most interesting to me is not only the what but also the how:

AICI is designed for both local and cloud execution, including (eventually) multi-tenant LLM deployments. Controllers are implemented as light-weight WebAssembly (Wasm) modules which run on the same machine as the LLM inference engine, utilizing the CPU while the GPU is busy with token generation. AICI is one layer in the inference stack, and is designed to allow control libraries such as Guidance, LMQL, and others to run on top of it and gain both efficiency and performance improvements, as well as portability across LLM inference and serving engines.

WebAssembly is a perfect fit for this. It's standard, fast, and embeddable. More importantly, it is relatively easy to run arbitrary code in a sandboxed, memory-safe execution environment, which is ideal for this type of use-case.

Once again, the Web Platform is the future of software. I'm glad to see Microsoft embracing it. I hope OpenAI and other LLM developers follow suit and implement something like this as well.