Links in LLM responses
Prompt engineering your LLM to return clickable links to the sources of the answer
Motivation
Your organization’s chatbots should add links in their answers. There are 3 main reasons for that:
Reliability - As we know, LLMs (Large Language Models) don’t always tell the truth. The best way to validate an answer, is to verify that the sources used to generate it actually support the answer. The sources are the evidence of the LLM’s claim.
Further research - If the user is looking to gain further background or understanding on the answer of the LLM, she might want to read into the sources of the answer.
Encourage skeptical reading - As people are using chatbots more and more, we should train our-selves to cross-validate the knowledge that we are consuming. Providing references in the answers should encourage this requirement as a standard. It will make both the chatbot builders and the users to improve the trustworthiness of the chatbot experience.
Such links will increase the trust and the usability of your chatbot.
Background
In order for the LLM to add links in its answers, you must provide them in your prompt. This requires some prompt engineering. The links must be provided along with their content. That is since during inference, LLMs have no native way of accessing resource such as the internet.
Today, the most common method of providing your chatbot with context, is by using Retrieval-augmented generation, a.k.a. RAG. In this method, the source documents are concatenated to the original user question before the prompt is sent to the LLM to respond.
When using RAG, the prompt basically looks like that:
Use the following pieces of context to answer the question at the end. Each context has its source. Context: content: Sunlight reaches ... source: https://spaceplace.nasa.gov/blue-sky/en/ Context: content: Altitudes, landscapes and climates ... source: https://spaceplace.nasa.gov/blue-sky/en/ ... Question: Why is the sky blue? Helpful Answer:
The answer would be:
The sky appears blue because of a phenomenon called Rayleigh scattering. When sunlight passes through Earth's atmosphere, the shorter wavelengths of blue and violet light are scattered in all directions by air molecules, particularly nitrogen and oxygen. This scattering is what gives the sky its blue color.
Without any evidence. You’ll just have to trust the chatbot. That’s not that good. Let’s see what we can do about that.
Markdown
Before we proceed, we should mention the markup language Markdown. There are two things that we need to know about Markdown:
Markdown can generate links.
Markdown is so common online, that LLMs natively use it in their responses.
Given those two facts, all we need to do is to instruct the LLM to state the sources of the answer, and it should be done.
Instructions
First attempt
So, given that the sources and links are added to the prompt, how do we add links to the response? Well, we ask the LLM to do that for us, which is just more prompt-engineering:
Use the following ... State the source for your answer. Context: ...
This will yield something like:
The sky appears blue because of a phenomenon called Rayleigh scattering, where the shorter wavelengths of light, such as blue and violet, are scattered more by the air and gas molecules in Earth's atmosphere compared to longer wavelengths like red light. This scattering of blue and violet light in all directions is what makes the sky blue.
Sources:
That’s nice. However:
It does not tell us which part of the answer derived from which source. For any given next question about the answer, the user will have to check both sources. We want to better focus the user in that case.
We could have just added those links at the bottom of the text as a post-processing step.
Answer segments
Let’s try the following instructions:
Use the following ... For each statement in the answer, state it's source. Context: ...
Will generate something like:
The sky appears blue because of Rayleigh scattering, which is when the shorter wavelengths of blue and violet light are more likely to be scattered by the air and gas molecules in Earth's atmosphere. (source: https://spaceplace.nasa.gov/blue-sky/en/)
Blue light is scattered more than the other colors because it travels as shorter, smaller waves, as a result of Rayleigh scattering. This phenomenon is what makes the sky blue. (source: https://www.rmg.co.uk/stories/topics/why-sky-blue)
Now, we know which part of the answer is based on which source. However, the links a bit chunky.
Inline links
Use the following ... The entire body of the answer itself must be links to the sources. Context: ... Helpful Answer in markdown format:
Will produce:
The sky is blue because of the way sunlight interacts with Earth's atmosphere through a phenomenon called Rayleigh scattering. Shorter wavelengths of light, like blue and violet, are scattered more by air molecules in the atmosphere, causing the sky to appear blue during the day. Blue light is scattered away while red light isn't scattered as much, which is why the sky appears red during sunrise and sunset.
It is starting to look good.
More Markdown
Finally, let’s instruct the LLM to use more Markdown:
Use the following ... The entire body of the answer itself must be links to the sources. Context: ... Helpful Answer formatted as markdown with titles, lists and links where appropriate:
to get this nicely formatted answer that links to its sources:
Why is the sky blue?
NASA Space Place: According to NASA Space Place, sunlight reaches Earth's atmosphere and is scattered in all directions by all the gases and particles in the air. Blue light is scattered more than other colors because it travels as shorter, smaller waves, which is why we see a blue sky most of the time.
Live Science via NASA Space Place: Live Science, as reported in collaboration with NASA Space Place, explains that the interaction of sunlight with Earth's atmosphere causes blue light to scatter in every direction due to the shorter wavelengths of light being more likely to be scattered by air and gas molecules, particularly nitrogen and oxygen, through a phenomenon called Rayleigh scattering.
Royal Observatory Greenwich: The Royal Observatory Greenwich adds that as white light passes through Earth's atmosphere, tiny air molecules cause it to scatter. This scattering increases as the wavelength of light decreases, with blue light having shorter wavelengths than red light. Therefore, blue light is scattered more than red light, making the sky appear blue during the day.
Summary
We can use prompt-engineering to encourage a LLM to provide evidence to its answer by following the following steps:
Try any reasonable instructions with the LLM.
Evaluate the response - what is missing?
Rephrase your instructions to yield the missing requirements or remove the undesired results.
Repeat until satisfaction.
Responsibility
In the age of AI, we must be skeptical knowledge consumers and providers. If we don’t do that, we will believe, spread, and eventually act, based on false information. This responsibility goes to both the AI developers and users, all of us.
References
The Jupyter Notebook behind this article for you to try.
Similar blog post about modifying LangChain’s prompt to list results.