Breaking the Language Model Barriers with LangChain 🦜️🔗

Jayita Bhattacharyya
6 min readMay 16, 2023

LangChain is a cross-platform framework built around language-based applications. With the revolutionizing growth around large language models, LangChain serves with most common language models through APIs. As of now, it's implemented in both Python & JavaScript/TypeScript.

Getting Started with Python

Here’s an illustration of LangChain modules using Python.

Installation

pip install langchain
# or
conda install langchain -c conda-forge

Modules in LangChain

Models

LangChain provides support to a wide range of models. Mainly this consists of :

LLMs — LangChain’s one of the vital models are Large Language Models which provides access to state-of-the-art LLMs. From OpenAI, Huggingface, and Cohere to AI24labs and many more. These can be used via API calls and platform-specific API tokens to get access to the models.

https://huggingface.co/blog/large-language-models

Below example let's try with HuggingFace.

# installation
pip install huggingface_hub
# setting api tokens
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "YOUR_HF_TOKEN"
from langchain import HuggingFaceHub

llm = HuggingFaceHub(repo_id="google/flan-t5-xl",
model_kwargs={"temperature":0,
"max_length":64})

llm("translate English to German: How old are you?")

Chat Models — These are more structured and take input as a series of chat inputs. Chat models, use language models underneath them. There are mainly 4 types of messages supported -AIMessage, HumanMessage, SystemMessage, and ChatMessage. These interfaces can take chat inputs in a single line or multiple lines.

from langchain.chat_models import ChatOpenAI
from langchain import PromptTemplate, LLMChain
from langchain.prompts.chat import (
ChatPromptTemplate,
SystemMessagePromptTemplate,
AIMessagePromptTemplate,
HumanMessagePromptTemplate,
)
from langchain.schema import (
AIMessage,
HumanMessage,
SystemMessage
)
chat = ChatOpenAI(temperature=0)

chat([HumanMessage(content="Translate this sentence from English to French. I love programming.")])

For more information check the below link

Text Embedding Models — As the name suggests, Text Embedding models provide an interface for text embeddings in vector representation forms. These are helpful in the case of a semantic search for similarity comparison.

https://www.ibm.com/blogs/research/2018/11/word-movers-embedding/
from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings()

text = "This is a test document."
query_result = embeddings.embed_query(text)

doc_result = embeddings.embed_documents([text])

The base Embedding class in LangChain consists of two methods: embed_documents and embed_query. embed_documents works over multiple documents, while embed_query works over a single document.

This is the list of integrations LangChain provides:

Prompts

Prompts are the new way of interacting with language models. A prompt is an input specifying instructions to the model. This interface offers prompt optimization and prompt management. LangChain makes prompt engineering easy to use. A PromptTemplate is responsible for the construction of this input.

LLM Prompt Templates - for prompting Language Models.

Chat Prompt Templates — for prompting Chat Models.

from langchain.prompts import PromptTemplate, ChatPromptTemplate
string_prompt = PromptTemplate.from_template("tell me a joke about {subject}")
string_prompt_value = string_prompt.format_prompt(subject="soccer")
string_prompt_value.to_string()

‘tell me a joke about soccer’

string_prompt_value.to_messages()

[HumanMessage(content=’tell me a joke about soccer’, additional_kwargs={})]

Indexes

Indexes help in structuring documents for LLMs to interact with them. These consist of utility functions for working with documents. Let's look into the four main components.

Document Loaders — Loading documents from various sources.

# Document Loader
from langchain.document_loaders import TextLoader
loader = TextLoader('./state_of_the_union.txt')
documents = loader.load()

Text Splitters — Implementation of splitting text. Working around long pieces of text and splitting up text into chunks/tokens helps in better analyzing its meaning with context.

# Text Splitter
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

VectorStores — This module deals with VectorStores and the integrations and implementation LangChain provides. Index vector embeddings and Vector databases store from NLP models are being referenced to understand the better meaning and context of strings of texts, phrases, sentences, and whole documents for better accuracy and relevant search results output.

# Embeddings
from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings()
from langchain.vectorstores import FAISS

db = FAISS.from_documents(docs, embeddings)

query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)

print(docs[0].page_content)

Retrievers — LangChain implementation and overview of retrievers. The below example shows how to read documents from arxiv.org through a LangChain API request call.

from langchain.retrievers import ArxivRetriever

retriever = ArxivRetriever(load_max_docs=2)
docs = retriever.get_relevant_documents(query='1605.08386')
docs[0].metadata # meta-information of the Document

Memory

The memory module is a get asset when it comes to implementing Chatbots where a series of conversations takes place and it is important to remain on the same subject. Unlike other modules such as Chains and Agents are stateless which means they treat each new query to the model as independent. LangChain provides an interface for memory, a collection of memory implementations, and examples of chains/agents that use memory.

https://www.hopkinsmedicine.org/health/wellness-and-prevention/inside-the-science-of-memory
from langchain import OpenAI, ConversationChain

llm = OpenAI(temperature=0)
conversation = ConversationChain(llm=llm, verbose=True)

conversation.predict(input="Hi there!")
conversation.predict(input="Let us talk about AI?")
conversation.predict(input="I'm interested in Foundational Models.")

LangChain provides several APIs for this purpose, check it out in the official docs here.

Chains

Chains allow the combination of multiple components to create a single, coherent application. For example, constructing a sequence that involves accepting user input, applying a PromptTemplate for formatting, and subsequently transmitting the formatted output to an LLM. We can enhance the complexity of these sequences by merging multiple chains or by integrating them with other components. The widely utilized chain is known as LLMChain.

from langchain import LLMChain

llm_chain = LLMChain(prompt=prompt, llm=llm)

question = "Can Barack Obama have a conversation with George Washington?"

print(llm_chain.run(question))

Agents

Agents consist of an LLM that makes decisions regarding actions to be taken, performs those actions, observes the outcomes, and repeats the process until the task is completed. When employed appropriately, agents can exhibit immense capabilities. To utilize agents effectively, it is crucial to grasp the following notions:

  • Tool: A specialized function responsible for carrying out specific tasks, such as Google Search, Database lookup, Python REPL, or similar functions. Refer to the list of available Tools.
  • LLM: The language model that empowers the agent.
  • Agent: The designated agent to be utilized. For further information, refer to Agent Types. as well.
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.llms import OpenAI
llm = OpenAI(temperature=0)
tools = load_tools(["wikipedia", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)
agent.run("In what year was the film Departed with Leopnardo Dicaprio released? What is this year raised to the 0.43 power?")

--

--

Jayita Bhattacharyya

Official Code-breaker | Generative AI | Machine Learning | Software Engineer | Traveller