At NVIDIA, the Sales Operations team equips the Sales team with the tools and resources needed to bring cutting-edge hardware and software to market. Managing this across NVIDIA’s diverse technology is a complex challenge shared by many enterprises.
Through collaboration with our Sales team, we found that they rely on internal and external documentation, often navigating multiple repositories to find the information. Now imagine an AI sales tool that can do all of this for you.
This post is part of the NVIDIA Chat Labs series, which shares insights and best practices developed from the internal generative AI projects created to help others navigate AI adoption.
This post explores how NVIDIA built an AI sales assistant using large language models (LLMs) and retrieval-augmented generation (RAG) technology to streamline sales workflows and address challenges, core solution components, and key lessons learned. For more information, see Explore Retrieval Models.
Key learnings
Here’s how to build a great AI sales assistant.
Start with a user-friendly chat interface
Begin with an intuitive, multi-turn chat platform powered by a capable LLM such as Llama 3.1 70B. Layer enhancements like RAG and web search through Perplexity API for advanced functionality without compromising accessibility.
Optimize document ingestion
Implement extensive preprocessing combining rule-based deterministic string processing with LLM-based logic for translation and editing. This approach maximizes the value of retrieved documents, significantly improving performance.
Implement wide RAG for comprehensive coverage
Use documents retrieved from internal document and media databases and public-facing content available on the company website to accommodate diverse workflows and ensure comprehensive information delivery.
Balance latency and quality
Optimize response speed and relevance by using strategies like showing early search results during long-running tasks and providing visual feedback on the progress of the answer generation.
Prioritize data freshness and diversity
Perform daily updates by ingesting items from an internal sales document and media database and implement real-time connections to structured data.
Address integration challenges by preparing for diverse data formats such as PDFs, slide decks, audio recordings, and video files by making use of NVIDIA Multimodal PDF Ingestion efficient parsing and Riva Automatic Speech Recognition transcription.
Developing an AI sales assistant
NVIDIA’s diverse portfolio, spanning LLMs, physics simulations, 3D rendering, and data science, challenges the Sales team to stay informed in the fast-paced AI market.
To address this challenge, we developed an AI sales assistant that integrates into workflows, providing instant access to the proprietary and external data. Powered by advanced LLMs like Llama 3.1 70B, with RAG, it offers a unified chat interface enriched with internal insights and external data.
Sales teams use the assistant to quickly answer queries such as, “What are the key benefits of NVIDIA RTX for data science?” or “Summarize recent CRM updates.” It also generates tailored responses for customer-specific inquiries, such as, “How does NVIDIA optimize AI training pipelines in healthcare?”
The assistant also supports document summarization, editing, and proofreading. Early users quickly adopted its conversational interface, appreciating how it improved prospecting, reporting and customer engagement compared to traditional retrieval systems.
Key benefits
Unified access to information: Combines internal NVIDIA data with broader insights through the Perplexity API and web search. Enterprise-grade chat: Handles diverse queries like spell-checking, summarization, coding, and analysis with models like Llama-3.1-405B-instruct. Streamlined CRM integration: Automates SQL query generation and enhances reporting by summarizing sales data directly within customer relationship management (CRM) systems using a Text2SQL approach.Architecture and workflows
The AI sales assistant is designed for scalability, flexibility, and responsiveness, with the following core architectural components:
LLM-assisted document ingestion pipeline Wide RAG integration Event-driven chat architecture Early progress indicatorsLLM-assisted document ingestion pipeline
The document ingestion process (Figure 1) addresses challenges such as documents translation from other languages, PDF parsing, and inconsistent formatting.
To ensure uniformity, all text is processed using an LLM, converting it into a standardized Markdown format for ingestion. Steps include parsing PDFs with NVIDIA Multimodal PDF Ingestion Blueprint, transcribing audio files with NVIDIA Parakeet NIM, editing and translating by Llama 3.1 70B, and storing results into a Milvus database.
NVIDIA-specific product names, such as NVIDIA RTX or NVIDIA NeMo, are also automatically enriched with short explanations obtained from a lookup table, enhancing the document’s clarity and usability for downstream processes.
Wide RAG integration
The AI sales assistant answers user queries by combining search results from vector retrieval on Milvus, web search restricted to NVIDIA website and Perplexity API (Figure 2). These responses often include a dozen or more inline citations, which pose challenges for an LLM when citations include lengthy URLs or detailed authorship information.
To ensure accuracy, we use prompts which replace citations with concise alphanumeric keys during text generation. In a subsequent postprocessing step, these keys are replaced with full citation details, resulting in significantly more reliable and accurate inline citations.

Event-driven chat architecture
Using LlamaIndex Workflows (Figure 2), the AI sales assistant efficiently manages the response generation through event-driven processes. Events capture the local state required for each step, ensuring smooth progression.
Each workflow step is supported by a Chainlit context manager, which enhances the user experience by providing visual progress indicators directly within the UI, simplifying error identification and debugging.
For tasks requiring complex reasoning, structured generation with chain-of-thought reasoning is used to significantly improve the quality of queries generated for CRM data.
Figure 3. displays the logical flow of the AI sales assistant, beginning with events for query routing and tagging, before splitting into distinct paths which can use either document-based RAG or a Text2SQL approach for CRM data to answer user questions. The architectural diagram highlights how the solution efficiently handles diverse data inputs including CRM data, call transcripts, and proprietary documentation.

The workflow offers multiple paths based on how the user’s query is routed, making it challenging to understand how an answer was generated without proper tracking of the data used and the steps executed by the system.
The following example code shows how LlamaIndex workflow steps are integrated with Chainlit for visual progress tracking and structured generation.
For more information, see Creating RAG-Based Question-and-Answer LLM Workflows, a technical deep-dive on implementing RAG with multiple data sources using Chainlit and LlamaIndex. The post also includes accompanying GitHub code to demonstrate key functionalities.
Early progress indicators
Citation cards (Figure 4) deliver real-time feedback during lengthy third-party API calls, enhancing the user experience by keeping you informed and engaged while responses are being generated.

The entire AI sales assistant system is visually represented in Figure 5, showcasing the integration of its core architectural components into a cohesive framework. It shows the main groupings of resources for document ingestion, retrieval-augmented generation for sales documents, and answering questions about structured data using CRM Text2SQL.

Pitfalls and trade-offs: Balancing innovation with usability
Developing the AI sales assistant presented several challenges that requires thoughtful trade-offs to balance innovation and user experience:
Latency and relevance Data recency Integration complexity Distributed workloadsLatency and relevance
Delivering fast responses is crucial for user experience, but generating accurate, relevant answers can be time-consuming.
To address this, we implemented strict time limits: a maximum of 8 seconds for web pages retrieval and parsing and 15 seconds for results from Perplexity API.
We also introduced UI elements that provide real-time summaries of RAG sources while answers are being generated, keeping users informed and engaged.
Data recency
Maintaining an up-to-date knowledge base is resource-intensive. We currently employ a one-year lookback period and are exploring strategies to better identify and prune outdated content.
Integration complexity
Integrating diverse data sources and formats—including PDFs, presentations, audio and video—required custom extraction and processing workflows. Those demanding efforts were critical to ensure comprehensive and accurate information coverage.
Distributed workloads
Long-running tasks, such as SQL queries, are handled through a partially distributed approach with a message queue. This ensures real-time interactions without compromising performance.
Summary
Building the AI sales assistant for the NVIDIA Sales team was a rewarding technical challenge, offering valuable insights into designing scalable, AI-driven solutions. Using a RAG-based architecture, we integrated diverse knowledge sources, optimized query handling, and ensured high performance and accuracy, to meet the demands of a dynamic, data-intensive environment.
By combining advanced LLMs, structured workflows, and real-time data retrieval, the AI sales assistant empowers the Sales team with instant, tailored insights while significantly enhancing workflow efficiency and user engagement. This project serves as a blueprint for developers tackling complex decision-support systems in fast-paced domains.
Future improvements will focus on refining strategies for real-time data updates, expanding integrations with new systems and formats, bolstering data security, and enhancing the handling of multimedia content. We are also exploring advanced personalization features to tailor solutions even more closely to individual user needs.
Inspired by our journey? NVIDIA provides a robust suite of generative AI tools and resources to help you design and implement your own AI solutions. Join our developer community to connect, share, and learn from like-minded innovators.