Overview & Why Groq
Most AI integrations either cost money per request, require a backend server to hide the API key, or are so slow they're annoying in a chat interface. Groq solves all three problems for UBG sites: it has a free tier, its API accepts browser requests directly (CORS is enabled), and it returns tokens faster than a human can read them.
Groq is not an AI company — it's an inference company. They run open-source models (LLaMA, Qwen, etc.) on custom hardware called LPUs (Language Processing Units), achieving speeds 10–100× faster than GPU-based providers. The API is OpenAI-compatible, meaning any code written for OpenAI's chat completions endpoint works on Groq with just a URL and key swap.
Blazing fast inference
Groq's LPU hardware delivers hundreds of tokens per second — fast enough that streaming responses appear to "typewrite" in real time rather than arriving in chunks.
~500 tok/sDirect browser calls
Unlike many AI APIs, Groq's endpoint has CORS headers set correctly. You can call it from fetch() in a static HTML file — no backend, no proxy, no server.
Generous free tier
The free tier includes rate limits high enough for a personal or small-scale UBG chatbot. Users bring their own key for high-traffic sites — no cost to you.
Free TierOpenAI-compatible API
The endpoint, request shape, and response format mirror OpenAI's Chat Completions API exactly. Switch from OpenAI by changing one URL and one API key.
Drop-in ReplaceGetting a Groq API Key
You need a Groq account and an API key before writing any code. The whole process takes about two minutes.
Create a free Groq account
Go to console.groq.com and sign up with Google or email. No credit card required for the free tier.
Navigate to API Keys
In the left sidebar of the Groq console, click API Keys. Click Create API Key, give it a name (e.g. "my-ubg-site"), and copy the key immediately — Groq only shows it once.
Store the key in your config file
Create a config.js in your project root and define the key and base URL as constants. Import this file before any script that calls the API.
Test the key with a curl request
Paste the test command below into your terminal (replace YOUR_KEY). You should get a JSON response with an AI-generated message within a second or two.
// config.js — import this before any AI scripts const GROQ_API_KEY = 'gsk_YOUR_KEY_HERE'; const GROQ_BASE_URL = 'https://api.groq.com/openai/v1'; const GROQ_MODEL = 'llama-3.3-70b-versatile'; // default model
curl https://api.groq.com/openai/v1/chat/completions \ -H "Authorization: Bearer YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3.3-70b-versatile", "messages": [{ "role": "user", "content": "Say hello in one sentence." }] }'
Available Models
Groq hosts a rotating selection of open-source models. The table below covers the ones most relevant to a UBG chatbot — all available on the free tier. Always check console.groq.com/docs/models for the current full list, as Groq frequently adds and retires models.
| Model ID | Parameters | Context | Best for | |
|---|---|---|---|---|
| llama-3.3-70b-versatile | 70B | 128K | General chat, coding, long context tasks | ⭐ Recommended |
| llama-3.1-8b-instant | 8B | 128K | Low-latency responses, high volume usage | ⚡ Fastest |
| meta-llama/llama-4-maverick-17b-128e-instruct | 17Bx128E MoE | 128K | Vision + text, image understanding | 🖼 Vision |
| groq/compound-beta | — | — | Web search + code execution built-in | 🔍 Agentic |
| groq/compound-beta-mini | — | — | Faster agentic model for simpler tasks | ⚡ Fast Agentic |
llama-3.3-70b-versatile. It has the best overall quality on the free tier, a 128K context window (enough for very long conversations), and supports tool use and JSON mode if you need them later. Switch to llama-3.1-8b-instant if you need lower latency or expect very high request volume.
Your First API Call
The Groq API endpoint is POST https://api.groq.com/openai/v1/chat/completions. You pass a JSON body with a model and a messages array, and receive a JSON response with the generated text under choices[0].message.content.
Here's the minimal fetch call — works in any browser with no dependencies:
// groq.js — import config.js first async function askGroq(userMessage) { const response = await fetch(`${GROQ_BASE_URL}/chat/completions`, { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${GROQ_API_KEY}`, }, body: JSON.stringify({ model: GROQ_MODEL, messages: [ { role: 'user', content: userMessage } ], }), }); if (!response.ok) { const err = await response.json(); throw new Error(err.error?.message || 'Groq request failed'); } const data = await response.json(); return data.choices[0].message.content; } // Usage: const reply = await askGroq('What is 2 + 2?'); console.log(reply); // "2 + 2 equals 4."
The response JSON looks like this:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1708045122,
"model": "llama-3.3-70b-versatile",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "2 + 2 equals 4." // ← your text is here
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 14,
"completion_tokens": 8,
"total_tokens": 22
}
}
.json() on an error response without checking response.ok first will give you the error body, not the completion — and accessing choices[0] on it will throw. Always check if (!response.ok) and handle or surface the error message to your users.
Streaming Responses
Without streaming, your UI waits for the entire response before displaying anything — this can feel slow even on fast models. With streaming enabled, tokens arrive as Server-Sent Events (SSE) and you can render them to the DOM as they come in, giving users that satisfying typewriter effect.
Enable streaming by adding "stream": true to the request body, then read the response body as a stream and parse the SSE chunks:
/** * Stream a Groq response token-by-token. * @param {string} userMessage — the user's text * @param {Function} onToken — called with each new text chunk * @param {Function} onDone — called when the stream completes * @param {Array} history — optional conversation history */ async function streamGroq(userMessage, onToken, onDone, history = []) { const messages = [ ...history, { role: 'user', content: userMessage }, ]; const response = await fetch(`${GROQ_BASE_URL}/chat/completions`, { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${GROQ_API_KEY}`, }, body: JSON.stringify({ model: GROQ_MODEL, messages, stream: true, // ← enables streaming max_tokens: 1024, }), }); if (!response.ok) { const err = await response.json(); throw new Error(err.error?.message || 'Groq stream failed'); } // Read the streaming body line-by-line const reader = response.body.getReader(); const decoder = new TextDecoder(); let buffer = ''; while (true) { const { done, value } = await reader.read(); if (done) break; buffer += decoder.decode(value, { stream: true }); const lines = buffer.split('\n'); buffer = lines.pop(); // keep incomplete line in buffer for (const line of lines) { if (!line.startsWith('data: ')) continue; const payload = line.slice(6).trim(); if (payload === '[DONE]') { onDone(); return; } try { const chunk = JSON.parse(payload); const token = chunk.choices[0]?.delta?.content; if (token) onToken(token); } catch { /* skip malformed chunks */ } } } onDone(); } // Usage: let fullText = ''; await streamGroq( 'Explain black holes in two sentences.', (token) => { fullText += token; document.getElementById('output').textContent = fullText; }, () => console.log('Stream complete') );
data: {...} lines, each containing a JSON chunk with a partial token in choices[0].delta.content. The stream ends with data: [DONE]. The code above reads the raw response body with a ReadableStreamDefaultReader, decodes each chunk, splits on newlines, and parses the JSON payloads — all standard browser Web APIs, no libraries needed.
System Prompts & Personas
A system message is a special role you add at the start of the messages array to give the model persistent instructions — its personality, constraints, and context. The model treats it as background instructions that always apply, no matter what the user says.
For a UBG site, the system prompt is where you define the chatbot's persona and focus it on game-related topics:
// Friendly game assistant persona const SYSTEM_PROMPT = `You are Axel, a friendly game guide for this unblocked games site. You help users find games, share tips and walkthroughs, and answer questions about the site. Keep responses concise and upbeat. If asked about something unrelated to games, politely redirect the conversation back to gaming topics.`; // Minimal assistant — just be helpful const SYSTEM_PROMPT = 'You are a helpful assistant. Keep answers short and clear.'; // Using the system prompt in your messages array: const messages = [ { role: 'system', content: SYSTEM_PROMPT }, // ← always first { role: 'user', content: 'How do I get past level 5 in Slope?' }, ];
Multi-Turn Conversations
The Groq API is stateless — it has no memory between requests. To make a chatbot that remembers earlier messages, you pass the entire conversation history in the messages array with every request. Each message has a role (system, user, or assistant) and content.
// Conversation state — keeps the full history in memory const chat = { history: [], // Reset to just the system message init(systemPrompt) { this.history = [{ role: 'system', content: systemPrompt }]; }, // Add a user message and get a streaming AI reply async send(userText, onToken, onDone) { this.history.push({ role: 'user', content: userText }); let assistantText = ''; await streamGroq( userText, (token) => { assistantText += token; onToken(token); }, () => { // Save the full assistant reply to history this.history.push({ role: 'assistant', content: assistantText }); onDone(assistantText); }, this.history.slice(0, -1) // pass history except the just-pushed user msg ); }, // Trim history to avoid hitting the context limit // Keeps system message + last N exchanges trim(maxExchanges = 20) { const [system, ...rest] = this.history; if (rest.length > maxExchanges * 2) { this.history = [system, ...rest.slice(-maxExchanges * 2)]; } }, }; // Initialize with a persona chat.init('You are Axel, a friendly game guide.'); // First message await chat.send('What games do you recommend?', onToken, onDone); // Second message — model remembers the first exchange await chat.send('Tell me more about the first one.', onToken, onDone);
trim() method above keeps the last 20 exchanges while preserving the system message. Alternatively, summarize old messages into a single compressed entry when the history gets long.
Request Parameters
Beyond model, messages, and stream, the Groq chat completions endpoint accepts these parameters to fine-tune response behavior. All are optional.
Controls randomness. Range 0–2. Lower values (0.2) give focused, deterministic answers — good for factual Q&A. Higher values (0.9) increase creativity. Default: 1.
Maximum tokens in the response. Use this to keep replies concise and stay within rate limits. For a chat UI, 512–1024 is a good range. Default: model maximum.
Nucleus sampling. Only the top p fraction of probability mass is sampled. Setting 0.9 means only tokens that collectively make up 90% of the probability are considered. Adjust this or temperature, not both.
One or more stop sequences — strings that end generation when encountered. Useful for structured output, e.g. stop: ["\n\n", "END"].
Set true to receive a streaming SSE response. Set false (default) for a single JSON response after full generation completes.
Integer seed for deterministic outputs. Same seed + same prompt = same response (within the same backend version). Useful for testing.
Range -2 to 2. Positive values penalize tokens that have already appeared, reducing repetition. Useful for long-form generation.
Range -2 to 2. Positive values penalize tokens proportionally to how often they've appeared — stronger anti-repetition than presence_penalty.
const response = await fetch(`${GROQ_BASE_URL}/chat/completions`, { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${GROQ_API_KEY}`, }, body: JSON.stringify({ model: 'llama-3.3-70b-versatile', messages: [{ role: 'user', content: 'Tell me a game tip.' }], stream: true, max_tokens: 512, temperature: 0.7, // slightly creative but focused top_p: 1, presence_penalty: 0.1, // light anti-repetition frequency_penalty: 0, }), });
Live Demo Widget
Try the Groq API right here. Enter your Groq API key and send a message — responses stream token-by-token from Groq's servers directly to this page with no backend involved.
api.groq.com, and is cleared when you close or refresh the page.
Full Chatbot Component
Here's a complete, drop-in chatbot widget for your UBG site. It combines everything covered in this guide: streaming, multi-turn history, a system prompt persona, and a polished UI. Copy both files into your project and add the script tags to your page.
<!-- 1. Add these script tags before </body> --> <script src="config.js"></script> <script src="chatbot.js"></script> <!-- 2. Floating toggle button --> <button id="chat-toggle" onclick="toggleChat()" style="position:fixed;bottom:24px;right:24px;z-index:999; width:52px;height:52px;border-radius:50%;background:#5ea67a; color:#fff;border:none;font-size:1.3rem;cursor:pointer; box-shadow:0 4px 20px rgba(94,166,122,0.45)"> 🤖 </button> <!-- 3. Chat window --> <div id="chat-window" style="display:none;position:fixed; bottom:88px;right:24px;width:360px;height:500px;z-index:998; background:var(--cream);border:1.5px solid var(--glass-border); border-radius:20px;overflow:hidden;box-shadow:0 20px 60px rgba(0,0,0,0.15); display:flex;flex-direction:column;"> <!-- Header --> <div style="padding:14px 18px;background:#5ea67a;color:#fff; display:flex;align-items:center;justify-content:space-between"> <span style="font-weight:600;font-size:0.9rem">🤖 Axel — Game Guide</span> <button onclick="toggleChat()" style="background:none;border:none;color:#fff;font-size:1.1rem;cursor:pointer">✕</button> </div> <!-- Message area --> <div id="chat-msgs" style="flex:1;overflow-y:auto;padding:14px 16px; display:flex;flex-direction:column;gap:10px"></div> <!-- Input row --> <div style="padding:10px 12px;border-top:1px solid rgba(0,0,0,0.07); display:flex;gap:8px"> <input id="chat-input" placeholder="Ask me anything..." onkeydown="if(event.key==='Enter')chatSend()" style="flex:1;padding:9px 14px;border-radius:50px;border:1.5px solid #ddd; font-size:0.85rem;outline:none"> <button onclick="chatSend()" style="width:36px;height:36px;border-radius:50%;background:#5ea67a; color:#fff;border:none;cursor:pointer;font-size:0.85rem">➤</button> </div> </div>
// chatbot.js — requires config.js (GROQ_API_KEY, GROQ_BASE_URL, GROQ_MODEL) const SYSTEM_PROMPT = `You are Axel, a friendly AI game guide for this unblocked games site. Help users find games, share tips, and answer questions. Be concise, upbeat, and avoid markdown formatting.`; let chatHistory = [{ role: 'system', content: SYSTEM_PROMPT }]; let chatOpen = false; let chatBusy = false; /* Toggle the chat window open/closed */ function toggleChat() { chatOpen = !chatOpen; const win = document.getElementById('chat-window'); win.style.display = chatOpen ? 'flex' : 'none'; if (chatOpen && chatHistory.length === 1) { appendMsg('ai', 'Hey! What game are you playing today?'); } } /* Add a message bubble to the chat window */ function appendMsg(role, text) { const msgs = document.getElementById('chat-msgs'); const bubble = document.createElement('div'); bubble.style.cssText = role === 'user' ? 'align-self:flex-end;background:#e8f4f0;padding:9px 13px;border-radius:14px 14px 4px 14px;font-size:0.84rem;max-width:80%;line-height:1.5' : 'align-self:flex-start;background:#f2ede8;padding:9px 13px;border-radius:14px 14px 14px 4px;font-size:0.84rem;max-width:80%;line-height:1.5'; bubble.textContent = text; bubble.id = role === 'ai' ? 'ai-bubble-latest' : ''; msgs.appendChild(bubble); msgs.scrollTop = msgs.scrollHeight; return bubble; } /* Send the user's message and stream the reply */ async function chatSend() { if (chatBusy) return; const input = document.getElementById('chat-input'); const text = input.value.trim(); if (!text) return; input.value = ''; chatBusy = true; appendMsg('user', text); chatHistory.push({ role: 'user', content: text }); const aiBubble = appendMsg('ai', '▋'); // typing cursor let reply = ''; try { const response = await fetch(`${GROQ_BASE_URL}/chat/completions`, { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${GROQ_API_KEY}`, }, body: JSON.stringify({ model: GROQ_MODEL, messages: chatHistory, stream: true, max_tokens: 512, temperature:0.7, }), }); if (!response.ok) throw new Error('API error: ' + response.status); const reader = response.body.getReader(); const decoder = new TextDecoder(); let buffer = ''; while (true) { const { done, value } = await reader.read(); if (done) break; buffer += decoder.decode(value, { stream: true }); const lines = buffer.split('\n'); buffer = lines.pop(); for (const line of lines) { if (!line.startsWith('data: ')) continue; const raw = line.slice(6).trim(); if (raw === '[DONE]') break; try { const chunk = JSON.parse(raw); const token = chunk.choices[0]?.delta?.content; if (token) { reply += token; aiBubble.textContent = reply + '▋'; document.getElementById('chat-msgs').scrollTop = 9999; } } catch { } } } aiBubble.textContent = reply; chatHistory.push({ role: 'assistant', content: reply }); // Trim to last 20 exchanges to avoid context overflow if (chatHistory.length > 42) { chatHistory = [chatHistory[0], ...chatHistory.slice(-40)]; } } catch (err) { aiBubble.textContent = '⚠ Error: ' + err.message; } chatBusy = false; }
config.js, show a settings modal where users enter their own Groq key (stored in localStorage). This distributes the cost across users and keeps your key off your source code entirely. LLaMA 3.3 70B on the free tier is 30 free requests per minute per key — plenty for a single user.
Next Steps
You now have a fully working AI chatbot powered by Groq. Here's where to take it from here:
Give the AI game context
Pass your games.json content into the system prompt so the AI knows exactly which games your site has and can make specific recommendations.
Tie AI to user accounts
Save per-user API keys and chat history in your account system so the AI remembers context across sessions and visits.
Accounts GuideUse Compound for web search
Switch to groq/compound-beta to give the AI real-time web search — it can look up game walkthroughs and news on the fly, with automatic citations.
Deploy and protect your site
Push to Cloudflare Pages and optionally use a Cloudflare Worker as a key proxy so your API key never appears in client-side source.
Deploy Guide