How to prepare materials for a chatbot using LLM?
The following outline was prepared in June 2024 and applies to the currently used language models. Due to their rapid development, this list may be subject to updates.
Define your goals
Start by clearly defining your problem and understanding the expected outcomes you want to achieve from your AI chatbot and the users of the solution.
Gather key data
Collect the necessary terms, rules, and parameters required for the effective operation of your AI chatbot, such as product feeds, survey data, instructions, manuals, or specific industry terminology.
Shape your materials
The best formats are: .docx, .doc. PDFs can be problematic because they may contain text embedded in various formats, such as images or vector graphics (complex layouts and layers, multiple columns, irregular formatting, or image-rich content that make it difficult for LLM to understand the context of the information) – the simpler the format to read, the better the results you will achieve.
Direct integrations with BigQuery (coming soon), CRM systems (coming soon), and knowledge bases allow for automatic updates (make sure the data there is well-organized). Remember, LLM does not incorporate content from additional attachments into its knowledge, only the content that is directly in the database. Attachments can be passed in the form of links, for example.
Product databases, available in formats such as XML or CSV, allow for automatic updates of product lists in the database.
Organize your content
Eliminate duplicates and content that may confuse the bot’s responses. Correct errors, fill in missing data, and ensure that the content across different materials does not contradict each other and is concise and to the point.
If there is a context for a particular response – e.g., the method for resetting a password differs depending on whether the user remembers their email address or not – add it to the response.
If your materials include tables, ensure that their description is understandable for a human, as failure to do so will complicate their proper implementation and checking the correctness of the interpretation.
The simpler the language, the more specifics (without room for bot interpretation), the better. The higher the quality of the input we provide to the chatbot, the better results it will deliver.
Organize for efficiency
- Thematic categorization (e.g., account management, credit applications)
- Categorization based on intent (e.g., problem-solving)
- Chronological order (organized according to time sensitivity)
- Contextual order (organized by the flow of the conversation).