Blog
Building a Quiz generator using Langchain Js, Qdrant DB, Open AI

Building a Quiz generator using Langchain Js, Qdrant DB, Open AI

Abir Dutta

Abir Dutta

Published on Monday, Jan 1, 2024

Introduction

I'm currently working as a full-stack intern at kira.ai, a startup. My latest task involves creating a quiz generator that extracts content from a PDF uploaded by the user. Despite most online resources focusing on AI/ML in Python, our project is entirely in JavaScript. This blog post shares my journey using Langchain JS to implement this feature. If you're new to AI and want to implement features in JavaScript rather than Python, you're in the right place.

Our code flow will follow these steps:

  1. Load the user-entered PDF.

  2. Break down the PDF into chunks and convert them into vector embeddings stored in databases like QdrantDB.

  3. Retrieve the relevant document parts and generate multiple-choice questions (MCQs).

No prior AI knowledge is needed to follow this blog; I assure you can understand each line.

  1. Usage

To create our Quiz Generator web app, we'll use:

  • Node.js for the backend

  • Langchain for creating language model-driven quizzes

  • OPEN AI API key for using GPT-3.5 Turbo

  • Qdrant DB for storing PDF content

  1. Qdrant DB

    Qdrant is a vector database and similarity search engine, managing high-dimensional vectors representing complex objects like words, images, videos, and audio. Ideal for machine learning, natural language processing, and other AI tasks. Unlike traditional relational databases, vector databases efficiently store and search high-dimensional data, providing unique experiences and complementing generative AI models.

    Please log in to Qdrant Cloud, where you can create a free-tier database and make a note of the database URL.

    For detailed instructions, refer to the Qdrant Docs.

  2. Implementation

We'll follow these steps:

Imports

npm install langchain

npm install pdf-parse

npm install -S @qdrant/js-client-res

Import the following libraries on top of your file.

import { PDFLoader } from "langchain/document_loaders/fs/pdf";
import { QdrantClient } from "@qdrant/js-client-rest";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";
import { OpenAI } from "langchain/llms/openai";
import { loadQAStuffChain } from "langchain/chains";
import { QdrantVectorStore } from "langchain/vectorstores/qdrant";
import { ChatOpenAI } from "langchain/chat_models/openai";

Loading PDF

We'll follow these steps:


async function loadDocs(directory) {
    const loader = new PDFLoader(directory);
    const docs = await loader.load();
    return docs;
}

Transforming Documents

Now split the document into chunks using RecursiveCharacterTextSplitter from LangChain, which by default tries to split on the characters [“\n\n”, “\n”, “ “, “”].

async function splitDocs(docs, chunkSize, chunkOverlap) {
    const textSplitter = new RecursiveCharacterTextSplitter({

        chunkSize,
        chunkOverlap
    });

    const chunks = await textSplitter.splitDocuments(docs);

    return chunks;
}

Generate Text Embedding

Text Embedding
Text embeddings convert words, sentences, or entire documents into numerical vectors, enabling computers to process and analyze textual information more effectively. These embeddings are often used in natural language processing (NLP) tasks, such as text classification, sentiment analysis, and information retrieval, as they help machines comprehend the underlying meaning and context of the text.

Utilize OpenAIEmbeddings to create embeddings for documents/text and store them on QdrantDB.

async function uploadDocs(chunks, name) {
    const embeddings = new OpenAIEmbeddings({
        openAIApiKey: process.env.OPENAI_API_KEY

    });

// storing the chubnks in Qdrant
    const index = await QdrantVectorStore.fromDocuments(
        chunks,
        embeddings,
        {
            url: process.env.QDRANT_URL,
            collectionName: name,
        }
    );
    return index;
}

Retrieving the Quiz from the document

Now we have to create a function that will give us the output by retrieving the relevant parts of text from vector embeddings and passing it to an LLM model like Open AI in our case.

async function getSimilarDocs(query, k, level, number, name) {

    const llmA = new ChatOpenAI({ modelName: "gpt-3.5-turbo", openAIApiKey: process.env.OPENAI_API_KEY });
    const chain = loadQAStuffChain(llmA);
    const embeddings = new OpenAIEmbeddings({
        openAIApiKey: process.env.OPENAI_API_KEY // In Node.js defaults to process.env.OPENAI_API_KEY

    });
    const index = await QdrantVectorStore.fromExistingCollection(
        embeddings,
        {
            url: process.env.QDRANT_URL,
            collectionName: name,
        }
    );
    const similarDocs = await index.similaritySearch(query, 5);
    // console.log(similarDocs);
    let template;
    if (level == "Hard") {
        template = `Given the topic: ${query},create ${number} multiple-choice questions (MCQs) with four options each. Provide the correct answer : right option option for each question. The level of questions should be hard. Format the output as an array of JSON with following keys:
  question
  option A
  option B
  option C
  option D
  answer : right option`
    } else if (level == "Medium") {
        template = `Given the topic: ${query},create ${number} multiple-choice questions (MCQs) with four options each. Provide the correct answer : right option option for each question. The level of questions should be medium. Format the output as an array of JSON with following keys:
    question
    option A
    option B
    option C
    option D
    answer : right option`
    } else {
        template = `Given the topic: ${query},create ${number} multiple-choice questions (MCQs) with four options each. Provide the correct answer : right option option for each question. The level of questions should be easy. Format the output as an array of JSON with following keys:
    question
    option A
    option B
    option C
    option D
    answer : right option`
    }
    console.log(template);
    const resA = await chain.call({
        input_documents: similarDocs,
        question: template,
    });
    return resA;



}

The function getSimilarDocs is a JavaScript function that takes in several parameters: query, k, level, number, and name.

Inside the function, it performs a series of tasks related to natural language processing and vector storage. Here's a breakdown of what it does:

  1. It creates an instance of the ChatOpenAI class, which is responsible for interacting with the OpenAI language model. The model used is called "gpt-3.5-turbo", and it requires an API key to access the OpenAI service.

  2. It loads a question-answering model called llmA using the loadQAStuffChain function from the langchain/chains module. This model is used for answering questions based on the given context.

  3. It creates an instance of the OpenAIEmbeddings class, which is responsible for generating vector embeddings from text. The API key used for this class is the same as the one used for the ChatOpenAI instance.

  4. It uses the QdrantVectorStore class from the langchain/vectorstores/qdrant module to create an index for storing vector embeddings. This index is created based on an existing collection of embeddings obtained from the embeddings instance.

  5. The URL for the Qdrant vector store is obtained from the process.env.QDRANT_URL environment variable.

Overall, this function sets up the necessary components for performing natural language processing tasks, such as question-answering and vector storage, using the OpenAI language model and the Qdrant vector store.

Summing Up

Now, let's put everything together

// Giving the hardcoded directory of pdf. You can edit the upload logic as per your needs, for more check langchain pdf loader docs.
const loadPDF=await loadDocs("./chat.pdf");

const chunks=await splitDocs(loadPdf, 1000, 20);

// giving a sample name , a new collection with name will be vcreated in qdrant cloud.
const upload =await uploadDocs(chunks, "chat_PDF");


const topicName="Plant";

const level="Easy" 
const number=10;
const name="chat_PDF";

/**
 * Retrieves a quiz based on the specified parameters.
 *
 * @param {string} topicName - The name of the topic for the quiz.
 * @param {number} k - The number of questions to include in the quiz.
 * @param {string} level - The difficulty level of the quiz.
 * @param {number} number - The unique identifier of the quiz.
 * @param {string} name - The name of the quiz.
 * @returns {Promise} - A promise that resolves to the generated quiz.
 */
const getQuiz=await getSimilarDocs(topicName,  level, number, name)
console.log(getQuiz);

To handle the custom file upload logic, visit Langchain Pdf Loaders.

You can get the topic name from which the quiz will be generated, the level, and the number of questions from the user from the front end and pass it via body parameters or URLParameters.

The output should be in a nice JSON structure, and your AI quiz generator is all ready to use using JS.

To follow or collaborate on more amazing content, follow me on social media.

LinkedIn

Twitter

Github