Do You Really NEED That LLM?
Before gearing up for yet another AI project, ask if it's really the best approach.
I’ve talked to a number of clients lately who have been leaning towards putting together some kind of Llama - a specialized large language model or LLM - to manage everything from publishing specialized content to creating friendlier user interfaces for querying databases. Sometimes this makes perfect sense, especially when dealing with largely unstructured document text.
However, LLMs and Llamas are far from perfect for a number of reasons.
You need a reasonably large corpus of documents, often in the 100s of thousands or more, before the query aspects of an LLM make any sense.
Quality control becomes a big issue; LLMs aren’t terribly transparent, which means that correcting mistakes usually requires recompiling the model and hoping that you catch the correction from the right source (the so-called hallucination problem).
There is relatively little standardization at present between different large scale providers, and this means that integration, which LLMs should in theory be able to handle easily, becomes much more problematic as the number of sources rise.
LLMs can be costly in comparison to other forms of data access.
Good deep-learning specialists are rare, and thus making changes in models has an associated labor cost.
Sometimes you really do need to curate sources.
Metadata processing is additional overhead, and LLMs generally do not handle metadata well. This has a major impact in areas such as master data management.
Given all of these, before you commit to going to an LLM solution, you should ask yourself a number of questions about what you’re trying to actually do. You might save yourself both money and headaches by exploring more traditional alternatives.
Consider, for instance, the following:
Vector Stores
In AI parlance, a vector is a way of representing the contents of a document as a labeled name/value pair, where the name is a “token” that conceptually represents a word and the value is a numeric value that indicates (among other things) the frequency of use of that word in the document, typically normalized so that it can be given as a number between 0 and 1. By comparing for each token the values of each document with another document, it becomes possible to establish whether the two documents are similar (the normalized sum is closer to 1) or dissimilar (the normalized sum is closer to 0).
This works very well in quickly determining relevancy scores. If you encode a prompt as a document in the same manner (using the tokens of the prompt to calculate the vector), you can also use this to search the database based upon the prompt itself. Vector stores don’t in general tell you much about specific sequences within the documents (this is the role of lang chains), but they can very quickly retrieve documents based upon similarity.
The OpenAI API has introduced the concept of Assistants, which seem to bundle the capabilities of vector stores and retrieval augmented generation (both used in Langchains) but vector stores have increasingly been finding their way into both knowledge graphs and NoSQL databases as ways to retrieve documents by relevancy quickly and without a lot of complexity.
Knowledge Graph Store
A knowledge graph store is a way of representing graphs of information, usually consisting of sets of assertions, that can be searched using one of several languages such as SPARQL, Cypher, or Gremlin. Unlike the use of vector stores which uses text analytics routines to identify similarities between documents, most knowledge graph stores extract information by finding subgraphs within a graph that satisfies specific patterns, and as such is closer to templatization approach that is used by LangChain to retrieve conceptual values.
Combining a vector store with a knowledge graph is actually not all that hard to do, if you work upon the notion that a document is essentially a subgraph with a known identifier. Moreover, these subgraphs can be thought of as shapes that follow specific rules or patterns, so it’s not just simply a matter of creating a key/value pair between a given resource and its underlying text vector, but by identifying that shape you can create searches that are “pre-constrained”, such as the shape of all vehicles or all employees of a given company.
Additionally, there are an increasing number of people who are recognizing that you can create intermediate in-memory graphs that perform the generation part of RAGs (in essence using SPARQL Update generatively), and these in turn can then be transformed either through SPARQL insert statements or serializing to XML, JSON, YAML or other formats and transforming from there via XSLT or other transformational languages. Wrap these up in a service call, and the semantic RAGs can behave very much like an LLM but without requiring the huge amount of “training” data, while also avoiding the twin problems of hallucinations and verification.
There’s an interesting precedent that makes me suspect that knowledge graph stores may very well be facing a major comeback. Hadoop created its own infrastructure in the early 2010s, going from a way to perform map-reduce operations in parallel across multiple servers using a dedicated Hadoop pipeline to eventually trying to establish itself as THE database of that era. What emerged was slow, feature poor, and lacked real scalability, but in the process, many companies developed their own cloud strategies that could do much the same thing without being tied to any one given platform (this era saw the rise of both AWS and Azure). Generative knowledge graphs could very well be a better solution for small to medium sized content systems than the current flavors of AI, and may even give large content systems a run for its money.
XML, XQuery, XSLT and JSON
Long before either knowledge graphs or language models graced the stage, XML dominated the landscape for digital content and asset management, with XSLT and XQuery providing ways to both work with relevancy ranking and in-document querying and retrieval across forests of documents.
Just as you can apply RAG principles to RDF, you can similarly apply it to XML, HTML, and XHTML (as well as other document structures such as JSON, of course). Many of the characteristics that you see in today’s LLM systems have been available in applications such as Marklogic for years, but again, developers steered clear of these because of the stigma that XML and related markup has received from the JSON community (primarily web developers). Similarly the multimodal approach is something that these document-centric stores have been championing for the last decade or more, and lang chain analogs for attention prompts have direct corollaries with the XPath language, which also has the advantage of being able to walk hierarchies.
It should be worth noting that the process of transforming between different formats and different ontologies has become easier in recent years, to the extent that tool chain, location, and database constraints no longer dictate what the data pipeline looks like. As APIs “cool” and become more stable, this common architecture becomes similar enough that you can get LLM-like behavior without LLM costs and risks.
This outlines when you may not want to use LLMs in preference to existing technologies:
the corpus size is comparatively small (<100,000 documents)
specific information in the documents is privileged (ACLs are document specific)
document structure is relatively regular
documents are already annotated with metadata
curation is not an issue (relatively static data set)
output is predictable and regular based upon type, shape or other relationship
accuracy is critical
you’re dealing with mixed namespaces (typically a sign of metadata enhanced content)
Summary
Put simply - an LLM is a powerful tool for document search and generation, but for many applications it’s overkill. For these applications, you may be better off working with knowledge graphs, which have many of the same characteristics but a more mature tool set. Additionally, it’s possible to use LLMs and Knowledge Graphs in tandem (through the use of RAGs), with the LLMs acting primarily as the interpreter of prompt tokens and the Knowledge Graph based RAGs providing mixed curatable content.
Kurt Cagle is the Editor of The Ontologist, and is a practicing ontologist, solutions architect, and AI afficianado. He lives in Bellevue, Washington with his wife and cats, and can be reached at 443-837-8725, or at kurt.cagle@gmail.com.
For more posts and resources on semantic modeling and ontology development, subscribe to The Ontologist:
Sign up for free Ontology Office Hours at Calendly.