Kioxia AiSAQ Improves AI Inference With Lower DRAM Costs

Digital Mind. Brain Artificial Intelligence Concept — Artificial Intelligence

getty

In April this year, Kioxia’s Rory Bolt gave me a briefing on Kioxia’s AiSAQ, an open-source project intended to promote the expanded use of SSDs in RAG AI solutions. The focus on AI is moving from generating foundational models with massive and expensive training to cost effective and scalable ways to create inference solutions that can solve real world problems.

Retrieval-Augmented Generation is an approach to AI that combined traditional information retrieval systems with large language models. RAG enhances the performance of LLMs by allowing them to access and incorporate information from external knowledge sources, such as databases, websites, and internal documents, before generating a response. This approach helps LLMs produce more accurate, contextually relevant, and up-to-date information, especially when dealing with specific domains or real-time data.

Kioxia has used AI to improve the output of its NAND fabs since 2017, mostly using machine vision to monitor trends and defect rates. In 2020 Kioxia used AI to generate the world’s first AI-designed Manga, Phaedo, drawing on manga drawings and stories based on Osuma Tezuka’s work.

I was told that although larger data centers feed data to their AI models using hard drives, many in-house solutions train using data on SSDs. These solutions often work with foundational LLM models created with very large data sets and use RAG using in-house and perhaps more up to date data to tune the foundational model for a particular application and to avoid hallucinations. The image below illustrates how a database can be used for tuning of the original LLM.

How Retrieval-Augmented Generation works to improve LLM Inference

Kioxia

Here the customer query is answered using the LLM as well as domain specific and up to date information in a vector data base. Such RAG solutions can be done with the data base index and vectors all in DRAM, but such an approach can use a lot of memory, making them very expensive, particularly for large data bases.

Microsoft developed Disk ANN which moved the bulk of the vector DB content to SSDs. This reduced the required DRAM footprint for the DB enabling greater scaling of vector DBS. This is used in products such as Azure Vector DB and Cosmos DB.

Kioxia’s All-in-Storage ANNS with Product Quantization, or AiSAQ completes the move of database vectors into storage, further reducing the DRAM requirements. These three approaches are represented in the drawing below.

Comparison of data base DRAM requirements for DRAM and SSD-based RAG architectures

Kioxia

Kioxia says that this approach enabled greater scalability for RAG workflows and thus better accuracy in the models. The image below shows the significant reduction of DRAM required for large databases compared to the DRAM-based, and DiskANN approach and the improved query accuracy.

AiSAQ reduces DRAM costs, improves speed and inference accuracy

Kioxia

In early July Kioxia announced further improvements to its AiSAQ. This new open source release allows flexible controls that allow system architects to define the balance point between search performance and the number of vectors, which are opposing factors with the fixed capacity of SSD storage in the system. The resulting benefit enables architects of RAG systems to fine-tune the optimal balance between specific workloads and their requirements, without any hardware modifications.

Kioxia’s AiSAQ allows more scalable RAG AI inference systems by moving database vectors entirely into storage, thus avoiding DRAM growth with increasing database sizes.

Forbes