Sep 10, 2024
/
AI
AI for Business Leaders: How to Build Smarter AI Products with RAG
By Justin Bowen
This post was originally published on LinkedIn.
In 2024, every business leader is searching for ways to integrate AI into their products to enhance their customer experience and gain an edge over competitors. But in most cases, integrating AI isn’t as simple as just plugging in ChatGPT’s API—and without proper implementation, it can lead to critical business mistakes.
Here’s an example: Say you have an e-commerce business. You’d like to add an AI assistant to help your customers more easily find products, thereby increasing sales while reducing the feelings of overwhelm while shopping. Instead of searching with filters and categories, users could ask for exactly what they want in natural language (e.g. “a men’s light blue jacket with pinstripes in size 40”).
While this sounds great in theory, the wrong implementation could turn it into a business nightmare. Imagine that upon launching your AI assistant, it showed your customers jackets from other retailers instead of results based on your inventory. Or even jackets that don’t exist! Now you’re losing sales, confusing customers, and hurting your brand’s reputation.
This would be an example of Large Language Models (LLMs) “hallucinating” or answering questions with incorrect or irrelevant information due to a lack of contextual guidelines and guardrails.
One low-effort way to combat this is through prompt engineering. However, this alone is not always the most effective solution (I explore why under “Technical Considerations” below). The best way for businesses to ensure their AI accurately represents their brand, while providing a great user experience, is by enhancing its capabilities with Retrieval-Augmented Generation (RAG).
Business benefits of implementing RAG
As AI becomes more mainstream, businesses face a key challenge: how can you use it strategically to ensure that it aligns with your specific use cases and business needs? That’s where Retrieval-Augmented Generation (RAG) can help, allowing your AI to provide more relevant and consistent experiences within your product.
RAG works by providing your AI with additional context based on your business data or third-party data sources. It is essentially an extra layer that feeds relevant data into your LLM of choice (such as ChatGPT, Gemini, or Claude).
From a business perspective, implementing RAG with your AI can enhance:
User experience – RAG enables your AI to deliver more personalized, relevant responses using your data, generating unique content that standard LLMs can't produce on their own.
Operational efficiency – Rather than manually categorizing or indexing data, certain RAG implementations can automatically organize it and make it searchable through natural language queries, eliminating the need for SQL skills to retrieve information.
Business intelligence – RAG can identify patterns in your internal data and automatically surface insights to drive decision-making across teams like marketing, sales, and customer satisfaction—without the need to predefine indexes or query parameters.
In this article, we’ll explore the business value of RAG in customer-facing products, focusing on its impact on user experience and how businesses can leverage it to enhance existing offerings or build new AI-driven products.
In future articles, we’ll explore RAG’s capabilities in boosting operational efficiency and business intelligence. I'll also be creating more technical content to help engineers understand vector embedding and how to build a RAG pipeline.
RAG’s value in B2B products
AI with RAG can improve B2B products by providing users with more precise and actionable insights based on internal data. It can enhance customer interactions, decision-making, and overall business outcomes. Here, I’ll share examples of possible applications in the healthcare and hospitality industries.
RAG in healthcare: Improving health recommendations and patient outcomes
If you’ve had a physical recently, your doctor may have asked for permission to record the conversation so AI could summarize it, eliminating the need for taking real-time notes. Implementing a RAG system would take this a step further.
With RAG, an AI could retrieve this information to help spot patterns and trends, identify health risks, and generate high-level recommendations for the healthcare provider to review. The data it leverages could include transcripts and notes from previous visits or even biometric data shared from personal health devices (like Apple Watches). The RAG pipeline could also incorporate internal hospital data, such as information about processes and best practices, to make the recommendations more actionable for providers, or match patients with the right specialists by tapping into doctor profiles.
This would assist healthcare providers in offering more comprehensive and timely guidance, with the goal of improving efficiency, patient care, and overall health outcomes.
RAG in hospitality: Enhancing guest experience and optimizing concierge efficiency
In 2017, I developed a concierge app for Tokyo's Ritz-Carlton and Grand Hyatt hotels. Our MVP allowed concierge teams to build custom travel itineraries for guests and store internal notes about local sights and restaurants. For example, they could note if a particular restaurant requires a reservation, serves the best ramen in town, or if a guest had a bad experience there.
All concierge teams at luxury hotels have some kind of database like this, filled with proprietary insights to enhance the guest experience. Incorporating an LLM with a RAG pipeline would allow concierges to instantly find the best recommendations by searching their data with natural language prompts based on guest needs. This would boost concierge efficiency and improve the quality of service. Additionally, new team members, especially those who have recently relocated (luxury hotels tend to employ a very international staff), could get up to speed faster with the accumulated knowledge of their peers readily available.
A RAG pipeline could also incorporate guest-specific data for more personalized service. For example, the Ritz-Carlton could create and maintain guest profiles that include concierge requests and preferences from previous stays, accessible by staff across all properties. Before the guest arrives at their next destination, the concierge could receive AI-generated recommendations. By anticipating guest needs before and during their stay, hotel groups can significantly enhance customer satisfaction and brand loyalty.
RAG’s value in B2C products
RAG also offers the ability to create more personalized and engaging experiences in consumer products. Instead of just integrating ChatGPT into your app, which will provide generalized and sometimes unpredictable content, implementing a RAG pipeline can tailor the user experience based on internal or individual user data.
RAG for travel: Streamlining trip planning with personalized itineraries
If you’ve ever planned a trip, you know how complicated and time-consuming it can be. My wife often plans trips for us or groups of friends, which involves hours of research, lots of back-and-forth, and multiple spreadsheets. A travel planning product with AI and RAG could streamline this process significantly.
For example, imagine an app that creates personalized trip itineraries. By tapping into APIs like Google Places and Hotels, LLMs can leverage real-world information about where to eat, where to stay, and what to do. Then, once a user has shared their preferences—such as trip pacing, activity types (like nature experiences vs. city exploration), physical limitations, and dietary restrictions—a RAG-enhanced AI could generate custom recommendations. If a group of friends are traveling together, it could consider each individual’s preferences to create a plan that works for everyone. RAG can also refine itineraries based on the user’s activity within the app, such as places they’ve saved to a wishlist or their reviews of previously visited locations.
For a slightly different use case, consider a travel group with internal data on hotels, restaurants, and activities their members have enjoyed. They could create a tool that uses AI and RAG to aggregate this information and offer community-driven recommendations, providing a more unique trip planning experience.
With AI and RAG, you could leverage a variety of internal or external data sets to help users build their ideal travel itineraries. By creating a highly customized experience, you can increase user engagement, satisfaction, and, ultimately, sales (of app subscriptions and/or products within the app).
I’m currently building Active Agent, an AI agents-as-a-service platform to help businesses leverage their data through a RAG pipeline and more efficiently build applications like the ones mentioned above. The open-source V1 will be available in October 2024.
Implementing RAG in your product: Technical considerations
Now, we’ll explore technical considerations that the business and engineering teams will need to evaluate before building RAG into their product.
What about prompt engineering?
As mentioned earlier, prompt engineering is a low-effort but less flexible alternative to RAG. In prompt engineering, engineers or product managers manually design and structure prompts to improve the consistency and accuracy of AI responses.
The biggest drawback of relying solely on prompt engineering is that it requires your team to anticipate the specific questions users will ask and how the AI should respond. This can lead to errors when users ask unexpected questions or require information that isn't already built into the prompt.
For the best experience, you’ll want your AI model to retrieve information dynamically in real-time, which is where RAG comes in.
Ultimately, the best solution is probably a combination of both: using prompt engineering to guide the structure of user interactions and RAG to retrieve data for more accurate, dynamic responses.
Techniques for building RAG pipelines
Here, I’ll explore the three most common approaches to building RAG pipelines for AI applications (from least to most complex).
Dynamic prompting (less complex)
Dynamic prompting is a technique in RAG that retrieves relevant user data or business context to include in interactions between the user and AI.
Continuing the earlier example of a travel planning app, dynamic prompting would allow the AI to pull in a user’s travel history, preferences, and profile information to provide personalized suggestions without requiring the user to re-enter this data.
Unlike prompt engineering, where inputs are manually created for specific queries, dynamic prompting automatically injects relevant context for more personalized responses with less setup.
While dynamic prompting is a common and relatively low-effort approach to RAG, it is limited to pre-loaded data, making the system less adaptable. This can be addressed with tool calling.
Tool calling (more complex)
Tool calling allows for real-time retrieval of data from third-party services or internal databases.
For example, if a user asks for restaurant recommendations, the AI can use a tool call to access Google Places, pulling in live data such as current restaurant reviews, operating hours, or availability. This ensures the user receives the most accurate and relevant information. Tool calling also allows AI systems to trigger specific actions, such as booking a reservation.
Tool calling is often paired with dynamic prompting to create a more comprehensive and flexible experience, without the limitations of static or pre-loaded data.
Similarity search (most complex)
Similarity search enhances your AI by leveraging language embeddings, allowing it to identify and group similar items in your dataset based on user queries, even when the data isn’t perfectly organized.
In the travel app example, if a user asks, "What activities are similar to my previous trips?" the AI can analyze their trip history using similarity search. By recognizing patterns in the types of activities they’ve enjoyed—such as visiting art galleries or going on food tours—the AI can suggest similar experiences for future trips. Even if past trips weren’t labeled with specific details, the AI can still recommend activities based on common themes.
This ability can also be integrated into an LLM as a tool, allowing the AI to autonomously query your business data. For example, the AI might ask, "What are some other activities with similar characteristics?"
This provides users with more robust recommendations based on their own travel patterns, providing an even richer, more personalized experience.
Technology requirements
Of course, implementing AI with RAG comes with specific technical requirements. This section will stay fairly high-level for business leaders, and I will be creating more technical content for engineers in the future.
Security considerations
Before implementing AI and RAG within your product, it's crucial to consider privacy and security. If your AI is leveraging sensitive data such as PHI (Protected Health Information) or PII (Personally-Identifying Information), you will need to take extra measures to avoid sharing this information with LLM providers like OpenAI.
One way to mitigate these risks is to self-host an open-source model like Meta’s Llama. Alternatively, you could use cloud-hosted solutions like ChatGPT on Azure, Claude on AWS/GCP, or Gemini on GCP. Just keep in mind that having multi-cloud infrastructure within your organization can create extra complexity and burdens for your DevOps team.
Ultimately, you'll need to weigh compliance requirements and balance quality with complexity to determine which approach is right for your organization.
Infrastructure, data storage & costs
One major technical consideration is your infrastructure. This is where your team may need to make significant changes–primarily by adopting new data storage solutions such as vector storage.
For many RAG techniques, your data must be processed and indexed as vector embeddings. This typically involves generating embeddings using an LLM or dedicated vector embedding model and then storing them in a specialized vector database that is designed to support high-dimensional search and retrieval.
This process is more complex than traditional data storage due to the need for efficient similarity search, vector indexing, and query handling across large-scale datasets. As such, it requires an additional investment in cloud computing resources. You can expect these costs to be significant in the initial phase to backfill embeddings for existing data, but only slightly more than your current compute costs on an ongoing basis (generating embeddings for new data created by your normal business operations).
If you're using OpenAI’s services, their API offers integrated vector storage features, which can streamline implementation and reduce the overhead of maintaining your own infrastructure.
Technical skills
You must also ensure that your technical team fully understands how to architect for RAG. This includes tasks like generating vector embeddings, storing them in a vector database, and executing similarity searches within the vector space. This type of work used to be delegated to data scientists, but as your system scales, engineers will need to offer support by leveraging one of the many new vector databases or extending your existing databases with vector embedding support. This is still a relatively niche skill set.
Finally, it’s important to note that generative AI and RAG are rapidly evolving technologies. Like any new technology, you will need to continuously evolve your implementation to remain competitive.
Professional services
Hi, I’m Justin Bowen. 👋
I’m an engineer (15+ years) and CTO consultant. I’ve spent most of the last 10 years building AI products with real-world impact, including:
Platform for improving patient care in operating rooms (in partnership with Stanford Healthcare) 🏥
Systems for monitoring the health of cows in dairy barns (in partnership with Cargill Animal Nutrition, acquired by Ever.ag) 🐄
Drone analysis platform for field crops 🌱
Currently, I’m focused on:
Building an open-source framework for AI agents-as-a-service 🤖
Helping clients build better AI systems (as a fractional team member or managing end-to-end product design and development) 🔧
Conducting tech due diligence for VCs during high-value fundraising rounds and acquisitions 🔎
If you’re interested in learning more about me, RAG, or receiving guidance tailored to your business, you can schedule a free 30-minute consultation: https://cal.com/tonsoffun/free-consultation
You can also follow me on LinkedIn for more AI insights for business and engineering leaders.
Read More Articles
Check out more blogs on tech & AI by Justin Bowen.