How I Built a RAG Chatbot for My Portfolio.
Jun 9, 2026
Most portfolio sites have a contact form. I wanted something more interesting — a chatbot that can actually answer questions about me. Not a fake one that returns canned responses, but a real Retrieval-Augmented Generation (RAG) pipeline backed by my CV, streaming answers token-by-token, living right inside the portfolio UI.
This post walks through exactly how it works, from the database to the floating chat button.
The Architecture at a Glance

Step 1 — The Vector Database
I used PostgreSQL with the pgvector extension to store document embeddings. A single migration sets everything up:
-- Enable pgvector
create extension if not exists vector;
-- Store chunked CV text alongside its embedding
create table if not exists public.documents (
id bigserial primary key,
content text,
metadata jsonb,
embedding vector(768)
);
-- IVFFlat index for fast cosine similarity search
create index if not exists documents_embedding_idx
on public.documents
using ivfflat (embedding vector_cosine_ops)
with (lists = 100);The match_documents function does the actual semantic search. It takes the question embedding and returns the top-N most similar chunks ranked by cosine similarity:
create function public.match_documents(
query_embedding vector,
match_count integer default 10,
filter jsonb default '{}'::jsonb
)
returns table (id bigint, content text, metadata jsonb, similarity double precision)
language plpgsql
as $$
begin
return query
select
d.id,
d.content,
d.metadata,
1 - (d.embedding <=> query_embedding) as similarity
from public.documents as d
where d.embedding is not null
and d.metadata @> filter
order by d.embedding <=> query_embedding
limit match_count;
end;
$$;The <=> operator is pgvector's cosine distance. 1 - distance = similarity, so the closest chunks come first.
Step 2 — Prisma with the PrismaPg Adapter
The Prisma schema maps the documents table and uses vector(768) as an unsupported (raw) type — pgvector isn't a native Prisma type yet, so raw SQL queries handle the vector operations:
model Document {
id BigInt @id @default(autoincrement())
content String?
metadata Json?
embedding Unsupported("vector(768)")?
@@map("documents")
}Because this is a Next.js app with edge-ish server functions, I use the PrismaPg driver adapter over a pg connection pool to keep connections efficient and avoid the default binary protocol issues with pgvector:
import { PrismaClient } from '@/generated/prisma/client';
import { PrismaPg } from '@prisma/adapter-pg';
import { Pool } from 'pg';
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
const adapter = new PrismaPg(pool);
const globalForPrisma = globalThis as unknown as { prisma?: PrismaClient };
export const prisma =
globalForPrisma.prisma ?? new PrismaClient({ adapter });
if (process.env.NODE_ENV !== 'production') {
globalForPrisma.prisma = prisma;
}The globalThis singleton pattern prevents creating a new Prisma client on every hot-reload in development.
Step 3 — Parsing and Chunking the CV
Rather than maintaining a separate data ingestion pipeline, I parse my CV PDF directly at runtime. The CV lives at public/info.pdf.
I wrote a lightweight PDF text extractor that reads the raw PDF byte stream and pulls out string objects using the Tj text-show operator — no external PDF library needed:
function extractTextFromInfoPdf(buffer: Buffer) {
const pdf = buffer.toString('latin1');
const textParts: string[] = [];
let index = 0;
while (index < pdf.length) {
const stringStart = pdf.indexOf('(', index);
if (stringStart === -1) break;
let cursor = stringStart + 1;
let escaped = false;
let depth = 1;
let value = '';
while (cursor < pdf.length && depth > 0) {
const char = pdf[cursor];
if (escaped) {
value += `\\\\${char}`;
escaped = false;
} else if (char === '\\\\') {
escaped = true;
} else if (char === '(') {
depth += 1;
value += char;
} else if (char === ')') {
depth -= 1;
if (depth > 0) value += char;
} else {
value += char;
}
cursor += 1;
}
const nextOperator = pdf.slice(cursor, cursor + 20);
if (/\\sTj\\b/.test(nextOperator)) {
textParts.push(unescapePdfString(value));
}
index = cursor;
}
return textParts.join('\\n').replace(/\\s+\\n/g, '\\n').trim();
}The extracted text is then split into overlapping chunks using LangChain's RecursiveCharacterTextSplitter:
async function loadCvChunks() {
const filePath = join(process.cwd(), 'public', 'info.pdf');
const text = extractTextFromInfoPdf(await readFile(filePath));
const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 100,
});
return textSplitter.createDocuments([text], [{ source: 'public/info.pdf' }]);
}Step 4 — Embedding and Seeding
On the first ever request to the chat API, the system checks if the documents table is empty. If it is, it runs the seed:
async function ensureDocumentsLoaded() {
const count = await prisma.document.count();
if (!count) {
await seedDocuments();
}
}Seeding embeds each chunk with Google Gemini (gemini-embedding-001) at 768 dimensions, then inserts everything in a single Prisma transaction:
async function seedDocuments() {
const docs = await loadCvChunks();
const ai = new GoogleGenAI({ apiKey: process.env.GOOGLE_API_KEY });
const embeddings = await Promise.all(
docs.map(async (doc) => {
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
contents: doc.pageContent,
config: {
taskType: 'RETRIEVAL_DOCUMENT',
outputDimensionality: 768,
},
});
return {
content: doc.pageContent,
metadata: doc.metadata ?? {},
embedding: response.embeddings?.[0]?.values ?? [],
};
}),
);
await prisma.$transaction(
embeddings.map((entry) =>
prisma.$executeRaw`
insert into documents (content, metadata, embedding)
values (
${entry.content},
${JSON.stringify(entry.metadata)}::jsonb,
${toVectorLiteral(entry.embedding)}::vector
)
`,
),
);
}toVectorLiteral converts a number[] into Postgres vector literal syntax — e.g. [0.12,0.84,...].
Step 5 — The RAG Query Pipeline
When a user sends a question, the API:
- Embeds the question using the same Gemini model (with
RETRIEVAL_QUERYtask type) - Calls
match_documentsvia a Prisma raw query - Assembles the top chunks into a numbered context block
async function buildContext(question: string) {
await ensureDocumentsLoaded();
const ai = createAiClient();
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
contents: question,
config: {
taskType: 'RETRIEVAL_QUERY',
outputDimensionality: 768,
},
});
const questionVector = response.embeddings?.[0]?.values ?? [];
const chunks = await prisma.$queryRaw<MatchDocumentRow[]>`
select * from match_documents(
${toVectorLiteral(questionVector)}::vector,
${10},
${JSON.stringify({})}::jsonb
)
`;
return chunks
.map((chunk, i) => (chunk.content?.trim() ? `[${i + 1}] ${chunk.content.trim()}` : ''))
.filter(Boolean)
.join('\\n\\n');
}Step 6 — Streaming the LLM Response
The context is injected into a carefully scoped system prompt and fed to LLaMA 3.1 70B Instruct through the NVIDIA API using LangChain's ChatOpenAI (it's OpenAI-compatible):
function createLlmClient() {
return new ChatOpenAI({
apiKey: process.env.NVIDIA_API_KEY,
model: 'meta/llama-3.1-70b-instruct',
temperature: 0.7,
maxTokens: 1024,
streaming: true,
configuration: {
baseURL: '<https://integrate.api.nvidia.com/v1>',
},
});
}The response is streamed back as a ReadableStream<Uint8Array> using the Web Streams API, encoded text-by-text:
async function createAnswerStream(question: string) {
const context = await buildContext(question);
const llm = createLlmClient();
const stream = await llm.stream([
['system', `You are "Ask Prajwol"...\\n\\n<context>\\n${context}\\n</context>`],
['human', question],
]);
const encoder = new TextEncoder();
return new ReadableStream<Uint8Array>({
async start(controller) {
for await (const chunk of stream) {
const text = extractChunkText(chunk.content);
if (text) controller.enqueue(encoder.encode(text));
}
controller.close();
},
});
}
export async function POST(request: Request) {
const { question } = await request.json();
const answerStream = await createAnswerStream(question);
return new NextResponse(answerStream, {
headers: {
'content-type': 'text/plain; charset=utf-8',
'cache-control': 'no-cache, no-transform',
},
});
}Step 7 — The Chat UI
The floating chat button is a self-contained React client component. It opens a panel, manages message state, and reads the streamed response incrementally using the Fetch ReadableStream reader:
const reader = response.body.getReader();
const decoder = new TextDecoder();
let answer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
answer += decoder.decode(value, { stream: true });
// update the assistant message bubble in real time
setMessages((current) =>
current.map((msg) =>
msg.id === assistantMessageId ? { ...msg, content: answer } : msg,
),
);
}While waiting for the first tokens to arrive, rotating loading messages give the user feedback:
useEffect(() => {
if (!isOpen || !isSending) return;
const interval = window.setInterval(() => {
setLoadingMessageIndex((i) => (i + 1) % loadingMessages.length);
}, 3000);
return () => window.clearInterval(interval);
}, [isOpen, isSending]);The button itself has a BorderBeam animation to draw attention without being intrusive.
Environment Variables
DATABASE_URL=postgresql://...
GOOGLE_API_KEY=... # Gemini embeddings
NVIDIA_API_KEY=... # LLaMA 3.1 70B via NVIDIAWhat I'd Do Differently
- Hybrid search — combine keyword (BM25) with vector search for better recall on exact name/technology queries.
- Incremental seeding — detect CV changes by hash and re-embed only modified chunks, instead of a one-shot seed on empty table.
- Rate limiting — add per-IP rate limiting on the API route to prevent abuse.
- Conversation history — pass prior turns to the LLM for multi-turn context, currently each question is stateless.