AI Guide

Add free AI to
any site.

Groq gives you blazing-fast inference on top open-source models — LLaMA, Qwen, and more — with a free tier generous enough to power a real UBG chatbot. No proxy needed, no backend required.

  Low Difficulty   ~35 min read   Updated 2026
Intelligence at the speed of your users.
On this page
01

Overview & Why Groq

Most AI integrations either cost money per request, require a backend server to hide the API key, or are so slow they're annoying in a chat interface. Groq solves all three problems for UBG sites: it has a free tier, its API accepts browser requests directly (CORS is enabled), and it returns tokens faster than a human can read them.

Groq is not an AI company — it's an inference company. They run open-source models (LLaMA, Qwen, etc.) on custom hardware called LPUs (Language Processing Units), achieving speeds 10–100× faster than GPU-based providers. The API is OpenAI-compatible, meaning any code written for OpenAI's chat completions endpoint works on Groq with just a URL and key swap.

Blazing fast inference

Groq's LPU hardware delivers hundreds of tokens per second — fast enough that streaming responses appear to "typewrite" in real time rather than arriving in chunks.

~500 tok/s

Direct browser calls

Unlike many AI APIs, Groq's endpoint has CORS headers set correctly. You can call it from fetch() in a static HTML file — no backend, no proxy, no server.

CORS Enabled

Generous free tier

The free tier includes rate limits high enough for a personal or small-scale UBG chatbot. Users bring their own key for high-traffic sites — no cost to you.

Free Tier

OpenAI-compatible API

The endpoint, request shape, and response format mirror OpenAI's Chat Completions API exactly. Switch from OpenAI by changing one URL and one API key.

Drop-in Replace
API key visibility Since you're calling Groq from the browser, your API key will be visible in the JavaScript source. This is acceptable for a read-only chat interface — Groq keys can only generate text, not access billing or other account data. For high-traffic production sites, use the "user brings their own key" pattern shown in section 10, or proxy the request through a serverless function (Cloudflare Worker) that hides the key server-side.

02

Getting a Groq API Key

You need a Groq account and an API key before writing any code. The whole process takes about two minutes.

1

Create a free Groq account

Go to console.groq.com and sign up with Google or email. No credit card required for the free tier.

2

Navigate to API Keys

In the left sidebar of the Groq console, click API Keys. Click Create API Key, give it a name (e.g. "my-ubg-site"), and copy the key immediately — Groq only shows it once.

3

Store the key in your config file

Create a config.js in your project root and define the key and base URL as constants. Import this file before any script that calls the API.

4

Test the key with a curl request

Paste the test command below into your terminal (replace YOUR_KEY). You should get a JSON response with an AI-generated message within a second or two.

config.js
// config.js — import this before any AI scripts
const GROQ_API_KEY  = 'gsk_YOUR_KEY_HERE';
const GROQ_BASE_URL = 'https://api.groq.com/openai/v1';
const GROQ_MODEL    = 'llama-3.3-70b-versatile';  // default model
curl — test your API key
curl https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-versatile",
    "messages": [{ "role": "user", "content": "Say hello in one sentence." }]
  }'

03

Available Models

Groq hosts a rotating selection of open-source models. The table below covers the ones most relevant to a UBG chatbot — all available on the free tier. Always check console.groq.com/docs/models for the current full list, as Groq frequently adds and retires models.

Model ID Parameters Context Best for
llama-3.3-70b-versatile 70B 128K General chat, coding, long context tasks ⭐ Recommended
llama-3.1-8b-instant 8B 128K Low-latency responses, high volume usage ⚡ Fastest
meta-llama/llama-4-maverick-17b-128e-instruct 17Bx128E MoE 128K Vision + text, image understanding 🖼 Vision
groq/compound-beta Web search + code execution built-in 🔍 Agentic
groq/compound-beta-mini Faster agentic model for simpler tasks ⚡ Fast Agentic
Which model should I use for a chatbot? Start with llama-3.3-70b-versatile. It has the best overall quality on the free tier, a 128K context window (enough for very long conversations), and supports tool use and JSON mode if you need them later. Switch to llama-3.1-8b-instant if you need lower latency or expect very high request volume.

04

Your First API Call

The Groq API endpoint is POST https://api.groq.com/openai/v1/chat/completions. You pass a JSON body with a model and a messages array, and receive a JSON response with the generated text under choices[0].message.content.

Here's the minimal fetch call — works in any browser with no dependencies:

groq.js — minimal fetch call
// groq.js — import config.js first

async function askGroq(userMessage) {
  const response = await fetch(`${GROQ_BASE_URL}/chat/completions`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${GROQ_API_KEY}`,
    },
    body: JSON.stringify({
      model: GROQ_MODEL,
      messages: [
        { role: 'user', content: userMessage }
      ],
    }),
  });

  if (!response.ok) {
    const err = await response.json();
    throw new Error(err.error?.message || 'Groq request failed');
  }

  const data = await response.json();
  return data.choices[0].message.content;
}

// Usage:
const reply = await askGroq('What is 2 + 2?');
console.log(reply);  // "2 + 2 equals 4."

The response JSON looks like this:

response shape
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1708045122,
  "model": "llama-3.3-70b-versatile",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "2 + 2 equals 4."  // ← your text is here
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 8,
    "total_tokens": 22
  }
}
Always check response.ok before parsing If you hit a rate limit or send a bad request, Groq returns a non-200 status with an error object. Calling .json() on an error response without checking response.ok first will give you the error body, not the completion — and accessing choices[0] on it will throw. Always check if (!response.ok) and handle or surface the error message to your users.

05

Streaming Responses

Without streaming, your UI waits for the entire response before displaying anything — this can feel slow even on fast models. With streaming enabled, tokens arrive as Server-Sent Events (SSE) and you can render them to the DOM as they come in, giving users that satisfying typewriter effect.

Enable streaming by adding "stream": true to the request body, then read the response body as a stream and parse the SSE chunks:

groq.js — streaming fetch
/**
 * Stream a Groq response token-by-token.
 * @param {string} userMessage  — the user's text
 * @param {Function} onToken    — called with each new text chunk
 * @param {Function} onDone     — called when the stream completes
 * @param {Array} history       — optional conversation history
 */
async function streamGroq(userMessage, onToken, onDone, history = []) {
  const messages = [
    ...history,
    { role: 'user', content: userMessage },
  ];

  const response = await fetch(`${GROQ_BASE_URL}/chat/completions`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${GROQ_API_KEY}`,
    },
    body: JSON.stringify({
      model: GROQ_MODEL,
      messages,
      stream: true,       // ← enables streaming
      max_tokens: 1024,
    }),
  });

  if (!response.ok) {
    const err = await response.json();
    throw new Error(err.error?.message || 'Groq stream failed');
  }

  // Read the streaming body line-by-line
  const reader  = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });
    const lines = buffer.split('\n');
    buffer = lines.pop();  // keep incomplete line in buffer

    for (const line of lines) {
      if (!line.startsWith('data: ')) continue;
      const payload = line.slice(6).trim();
      if (payload === '[DONE]') { onDone(); return; }
      try {
        const chunk = JSON.parse(payload);
        const token = chunk.choices[0]?.delta?.content;
        if (token) onToken(token);
      } catch { /* skip malformed chunks */ }
    }
  }

  onDone();
}

// Usage:
let fullText = '';
await streamGroq(
  'Explain black holes in two sentences.',
  (token) => {
    fullText += token;
    document.getElementById('output').textContent = fullText;
  },
  () => console.log('Stream complete')
);
How SSE streaming works Groq sends the response as a series of data: {...} lines, each containing a JSON chunk with a partial token in choices[0].delta.content. The stream ends with data: [DONE]. The code above reads the raw response body with a ReadableStreamDefaultReader, decodes each chunk, splits on newlines, and parses the JSON payloads — all standard browser Web APIs, no libraries needed.

06

System Prompts & Personas

A system message is a special role you add at the start of the messages array to give the model persistent instructions — its personality, constraints, and context. The model treats it as background instructions that always apply, no matter what the user says.

For a UBG site, the system prompt is where you define the chatbot's persona and focus it on game-related topics:

system prompt examples
// Friendly game assistant persona
const SYSTEM_PROMPT = `You are Axel, a friendly game guide for this unblocked games site.
You help users find games, share tips and walkthroughs, and answer
questions about the site. Keep responses concise and upbeat.
If asked about something unrelated to games, politely redirect
the conversation back to gaming topics.`;

// Minimal assistant — just be helpful
const SYSTEM_PROMPT = 'You are a helpful assistant. Keep answers short and clear.';

// Using the system prompt in your messages array:
const messages = [
  { role: 'system', content: SYSTEM_PROMPT },  // ← always first
  { role: 'user', content: 'How do I get past level 5 in Slope?' },
];
System prompt tips Keep system prompts under 500 tokens for best performance. Be specific about tone ("concise", "friendly", "no markdown") — the model will follow formatting instructions reliably. Include context about your site name and purpose so the AI can answer "what is this site?" questions correctly.

07

Multi-Turn Conversations

The Groq API is stateless — it has no memory between requests. To make a chatbot that remembers earlier messages, you pass the entire conversation history in the messages array with every request. Each message has a role (system, user, or assistant) and content.

chat-history.js — stateful conversation manager
// Conversation state — keeps the full history in memory
const chat = {
  history: [],

  // Reset to just the system message
  init(systemPrompt) {
    this.history = [{ role: 'system', content: systemPrompt }];
  },

  // Add a user message and get a streaming AI reply
  async send(userText, onToken, onDone) {
    this.history.push({ role: 'user', content: userText });

    let assistantText = '';

    await streamGroq(
      userText,
      (token) => {
        assistantText += token;
        onToken(token);
      },
      () => {
        // Save the full assistant reply to history
        this.history.push({ role: 'assistant', content: assistantText });
        onDone(assistantText);
      },
      this.history.slice(0, -1)  // pass history except the just-pushed user msg
    );
  },

  // Trim history to avoid hitting the context limit
  // Keeps system message + last N exchanges
  trim(maxExchanges = 20) {
    const [system, ...rest] = this.history;
    if (rest.length > maxExchanges * 2) {
      this.history = [system, ...rest.slice(-maxExchanges * 2)];
    }
  },
};

// Initialize with a persona
chat.init('You are Axel, a friendly game guide.');

// First message
await chat.send('What games do you recommend?', onToken, onDone);

// Second message — model remembers the first exchange
await chat.send('Tell me more about the first one.', onToken, onDone);
Context window limits LLaMA 3.3 70B has a 128K token context window — enormous, but not infinite. A very long conversation will eventually hit the limit. The trim() method above keeps the last 20 exchanges while preserving the system message. Alternatively, summarize old messages into a single compressed entry when the history gets long.

08

Request Parameters

Beyond model, messages, and stream, the Groq chat completions endpoint accepts these parameters to fine-tune response behavior. All are optional.

temperature

Controls randomness. Range 0–2. Lower values (0.2) give focused, deterministic answers — good for factual Q&A. Higher values (0.9) increase creativity. Default: 1.

max_tokens

Maximum tokens in the response. Use this to keep replies concise and stay within rate limits. For a chat UI, 512–1024 is a good range. Default: model maximum.

top_p

Nucleus sampling. Only the top p fraction of probability mass is sampled. Setting 0.9 means only tokens that collectively make up 90% of the probability are considered. Adjust this or temperature, not both.

stop

One or more stop sequences — strings that end generation when encountered. Useful for structured output, e.g. stop: ["\n\n", "END"].

stream

Set true to receive a streaming SSE response. Set false (default) for a single JSON response after full generation completes.

seed

Integer seed for deterministic outputs. Same seed + same prompt = same response (within the same backend version). Useful for testing.

presence_penalty

Range -2 to 2. Positive values penalize tokens that have already appeared, reducing repetition. Useful for long-form generation.

frequency_penalty

Range -2 to 2. Positive values penalize tokens proportionally to how often they've appeared — stronger anti-repetition than presence_penalty.

example with all common parameters
const response = await fetch(`${GROQ_BASE_URL}/chat/completions`, {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${GROQ_API_KEY}`,
  },
  body: JSON.stringify({
    model:             'llama-3.3-70b-versatile',
    messages:          [{ role: 'user', content: 'Tell me a game tip.' }],
    stream:            true,
    max_tokens:        512,
    temperature:       0.7,   // slightly creative but focused
    top_p:             1,
    presence_penalty:  0.1,   // light anti-repetition
    frequency_penalty: 0,
  }),
});

09

Live Demo Widget

Try the Groq API right here. Enter your Groq API key and send a message — responses stream token-by-token from Groq's servers directly to this page with no backend involved.

Axel — AI Game Guide
AI
Hey! I'm Axel, your game guide. Enter your Groq API key below and ask me anything — game tips, walkthroughs, or just say hi!
API Key
Your key is never stored The key you enter above lives only in JavaScript memory in this browser tab. It is never sent anywhere except directly to api.groq.com, and is cleared when you close or refresh the page.

10

Full Chatbot Component

Here's a complete, drop-in chatbot widget for your UBG site. It combines everything covered in this guide: streaming, multi-turn history, a system prompt persona, and a polished UI. Copy both files into your project and add the script tags to your page.

chatbot.html — minimal embed (add to any page)
<!-- 1. Add these script tags before </body> -->
<script src="config.js"></script>
<script src="chatbot.js"></script>

<!-- 2. Floating toggle button -->
<button id="chat-toggle" onclick="toggleChat()"
  style="position:fixed;bottom:24px;right:24px;z-index:999;
         width:52px;height:52px;border-radius:50%;background:#5ea67a;
         color:#fff;border:none;font-size:1.3rem;cursor:pointer;
         box-shadow:0 4px 20px rgba(94,166,122,0.45)">
  🤖
</button>

<!-- 3. Chat window -->
<div id="chat-window" style="display:none;position:fixed;
  bottom:88px;right:24px;width:360px;height:500px;z-index:998;
  background:var(--cream);border:1.5px solid var(--glass-border);
  border-radius:20px;overflow:hidden;box-shadow:0 20px 60px rgba(0,0,0,0.15);
  display:flex;flex-direction:column;">

  <!-- Header -->
  <div style="padding:14px 18px;background:#5ea67a;color:#fff;
              display:flex;align-items:center;justify-content:space-between">
    <span style="font-weight:600;font-size:0.9rem">🤖 Axel — Game Guide</span>
    <button onclick="toggleChat()"
      style="background:none;border:none;color:#fff;font-size:1.1rem;cursor:pointer"></button>
  </div>

  <!-- Message area -->
  <div id="chat-msgs" style="flex:1;overflow-y:auto;padding:14px 16px;
       display:flex;flex-direction:column;gap:10px"></div>

  <!-- Input row -->
  <div style="padding:10px 12px;border-top:1px solid rgba(0,0,0,0.07);
              display:flex;gap:8px">
    <input id="chat-input" placeholder="Ask me anything..."
      onkeydown="if(event.key==='Enter')chatSend()"
      style="flex:1;padding:9px 14px;border-radius:50px;border:1.5px solid #ddd;
             font-size:0.85rem;outline:none">
    <button onclick="chatSend()"
      style="width:36px;height:36px;border-radius:50%;background:#5ea67a;
             color:#fff;border:none;cursor:pointer;font-size:0.85rem"></button>
  </div>
</div>
chatbot.js — complete implementation
// chatbot.js — requires config.js (GROQ_API_KEY, GROQ_BASE_URL, GROQ_MODEL)

const SYSTEM_PROMPT = `You are Axel, a friendly AI game guide for this
unblocked games site. Help users find games, share tips, and answer
questions. Be concise, upbeat, and avoid markdown formatting.`;

let chatHistory = [{ role: 'system', content: SYSTEM_PROMPT }];
let chatOpen    = false;
let chatBusy    = false;

/* Toggle the chat window open/closed */
function toggleChat() {
  chatOpen = !chatOpen;
  const win = document.getElementById('chat-window');
  win.style.display = chatOpen ? 'flex' : 'none';
  if (chatOpen && chatHistory.length === 1) {
    appendMsg('ai', 'Hey! What game are you playing today?');
  }
}

/* Add a message bubble to the chat window */
function appendMsg(role, text) {
  const msgs = document.getElementById('chat-msgs');
  const bubble = document.createElement('div');
  bubble.style.cssText = role === 'user'
    ? 'align-self:flex-end;background:#e8f4f0;padding:9px 13px;border-radius:14px 14px 4px 14px;font-size:0.84rem;max-width:80%;line-height:1.5'
    : 'align-self:flex-start;background:#f2ede8;padding:9px 13px;border-radius:14px 14px 14px 4px;font-size:0.84rem;max-width:80%;line-height:1.5';
  bubble.textContent = text;
  bubble.id = role === 'ai' ? 'ai-bubble-latest' : '';
  msgs.appendChild(bubble);
  msgs.scrollTop = msgs.scrollHeight;
  return bubble;
}

/* Send the user's message and stream the reply */
async function chatSend() {
  if (chatBusy) return;
  const input = document.getElementById('chat-input');
  const text  = input.value.trim();
  if (!text) return;

  input.value = '';
  chatBusy    = true;
  appendMsg('user', text);
  chatHistory.push({ role: 'user', content: text });

  const aiBubble = appendMsg('ai', '▋');  // typing cursor
  let reply = '';

  try {
    const response = await fetch(`${GROQ_BASE_URL}/chat/completions`, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${GROQ_API_KEY}`,
      },
      body: JSON.stringify({
        model:      GROQ_MODEL,
        messages:   chatHistory,
        stream:     true,
        max_tokens: 512,
        temperature:0.7,
      }),
    });

    if (!response.ok) throw new Error('API error: ' + response.status);

    const reader  = response.body.getReader();
    const decoder = new TextDecoder();
    let   buffer  = '';

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      buffer += decoder.decode(value, { stream: true });
      const lines = buffer.split('\n');
      buffer = lines.pop();
      for (const line of lines) {
        if (!line.startsWith('data: ')) continue;
        const raw = line.slice(6).trim();
        if (raw === '[DONE]') break;
        try {
          const chunk = JSON.parse(raw);
          const token = chunk.choices[0]?.delta?.content;
          if (token) {
            reply += token;
            aiBubble.textContent = reply + '▋';
            document.getElementById('chat-msgs').scrollTop = 9999;
          }
        } catch { }
      }
    }

    aiBubble.textContent = reply;
    chatHistory.push({ role: 'assistant', content: reply });

    // Trim to last 20 exchanges to avoid context overflow
    if (chatHistory.length > 42) {
      chatHistory = [chatHistory[0], ...chatHistory.slice(-40)];
    }

  } catch (err) {
    aiBubble.textContent = '⚠ Error: ' + err.message;
  }

  chatBusy = false;
}
"User brings their own key" pattern For a public production site, instead of hardcoding your key in config.js, show a settings modal where users enter their own Groq key (stored in localStorage). This distributes the cost across users and keeps your key off your source code entirely. LLaMA 3.3 70B on the free tier is 30 free requests per minute per key — plenty for a single user.

11

Next Steps

You now have a fully working AI chatbot powered by Groq. Here's where to take it from here:

Give the AI game context

Pass your games.json content into the system prompt so the AI knows exactly which games your site has and can make specific recommendations.

Games Guide

Tie AI to user accounts

Save per-user API keys and chat history in your account system so the AI remembers context across sessions and visits.

Accounts Guide

Use Compound for web search

Switch to groq/compound-beta to give the AI real-time web search — it can look up game walkthroughs and news on the fly, with automatic citations.

Groq Agentic Docs

Deploy and protect your site

Push to Cloudflare Pages and optionally use a Cloudflare Worker as a key proxy so your API key never appears in client-side source.

Deploy Guide

Ready to ship your site?

The Deploy guide walks you through Cloudflare Pages, WAF, and analytics — get your AI-powered site live in under 15 minutes.