Skip to article
AI/LLMs 26 Feb 2026 · 13 min read · 2.7K views

GPT-4o Deep Dive: Advanced Prompting, Function Calling & Structured Outputs

Master GPT-4o with chain-of-thought prompting, JSON mode, function calling schemas, and cost-optimisation patterns for production.

SK

Suboor Khan

Full-Stack Developer & Technical Writer

🟢
AI/LLMs

GPT-4o is OpenAI's flagship model — faster, cheaper, and multimodal. But calling chat.completions.create with a plain user message is leaving a huge amount of capability on the table. Once you understand function calling, structured outputs, and advanced prompting techniques, GPT-4o becomes a programmable reasoning engine rather than a chatbot.

This guide covers everything from zero-shot vs. few-shot prompting, chain-of-thought techniques, the modern function calling API, JSON mode, and hard-won cost optimisation tricks.

Advanced Prompting Patterns

Chain-of-Thought (CoT)

Simply appending "Think step by step." to your prompt dramatically improves accuracy on reasoning tasks. For even better results, use few-shot CoT: provide 2-3 examples of the reasoning chain before the actual question.

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    {
      role: 'system',
      content: `You are a precise analyst. When asked a question, always:
1. Break the problem into steps
2. Show your reasoning for each step
3. State your final answer clearly`
    },
    {
      role: 'user',
      content: 'Should we migrate our monolith to microservices? We have 8 engineers and 50K MAU.'
    }
  ],
  temperature: 0.3,  // lower = more consistent reasoning
});

Role Prompting

// System prompt persona engineering
const systemPrompt = `You are a senior software architect with 15 years of experience
at Google and Netflix. You give direct, opinionated advice based on real production
experience. You cite specific technologies and trade-offs. You never say "it depends"
without explaining what it depends *on* and why.`;

Output Format Control (without JSON mode)

// Constrain output format via prompt
const prompt = `Analyse this code for security vulnerabilities.
Return your response in EXACTLY this format:
SEVERITY: [Critical|High|Medium|Low]
VULNERABILITY: [name]
LOCATION: [file:line]
DESCRIPTION: [2 sentences max]
FIX: [specific code fix]

Code to analyse:
${userCode}`;

Function Calling (Tools API)

Function calling lets GPT-4o invoke your code. You define tool schemas; the model decides when and how to call them based on the conversation. The response contains structured arguments you validate and execute.

import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const tools = [
  {
    type: 'function',
    function: {
      name: 'get_stock_price',
      description: 'Get the current stock price for a ticker symbol.',
      parameters: {
        type: 'object',
        properties: {
          ticker: {
            type: 'string',
            description: 'Stock ticker, e.g. AAPL, MSFT',
          },
          currency: {
            type: 'string',
            enum: ['USD', 'EUR', 'GBP'],
            description: 'Return price in this currency',
          },
        },
        required: ['ticker'],
        additionalProperties: false,
      },
    },
  },
];

const messages = [{ role: 'user', content: 'What is Apple stock trading at right now?' }];

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages,
  tools,
  tool_choice: 'auto',  // 'none' | 'auto' | {type:'function',function:{name:...}}
});

const msg = response.choices[0].message;

if (msg.tool_calls) {
  for (const call of msg.tool_calls) {
    const args   = JSON.parse(call.function.arguments);
    const result = await fetchStockPrice(args.ticker, args.currency ?? 'USD');

    messages.push(msg);  // assistant message with tool call
    messages.push({
      role:         'tool',
      tool_call_id: call.id,
      content:      JSON.stringify(result),
    });
  }

  // Final response after tool execution
  const final = await openai.chat.completions.create({ model: 'gpt-4o', messages, tools });
  console.log(final.choices[0].message.content);
}

Structured Outputs (2024+)

Structured Outputs guarantee the model's response matches your JSON schema exactly — eliminating parse errors. Pass response_format with type: 'json_schema'.

import { zodResponseFormat } from 'openai/helpers/zod';
import { z } from 'zod';

const BugReport = z.object({
  severity:    z.enum(['critical', 'high', 'medium', 'low']),
  title:       z.string(),
  description: z.string(),
  steps:       z.array(z.string()),
  fix:         z.string().optional(),
});

const response = await openai.beta.chat.completions.parse({
  model: 'gpt-4o-2024-08-06',  // structured outputs requires this version or later
  messages: [
    { role: 'system', content: 'You are a QA engineer. Analyse bug reports.' },
    { role: 'user',   content: 'The login button disappears on mobile after form validation fails.' },
  ],
  response_format: zodResponseFormat(BugReport, 'bug_report'),
});

const bug = response.choices[0].message.parsed;
// TypeScript knows the exact shape — no casting needed
console.log(bug.severity, bug.title, bug.steps);

Structured Outputs vs. JSON mode: JSON mode just guarantees valid JSON. Structured Outputs guarantee your exact schema. Always prefer Structured Outputs for production.

100%

Schema compliance

0

Parse errors in prod

~15%

Latency overhead

Cost Optimisation Patterns

  • Use GPT-4o-mini for simple tasks. It costs 15× less than GPT-4o and handles classification, extraction, summarisation, and simple Q&A just as well.
  • Cache deterministic prompts. The Prompt Cache API reduces cost by up to 50% for repeated system prompts + prefixes. Keep your system prompt consistent across requests.
  • Limit max_tokens. You pay for output tokens. If answers should be short, set max_tokens to 200. Don't leave it unbounded.
  • Batch API for offline workloads. The Batch API is 50% cheaper for non-real-time tasks (document processing, bulk extraction). Use it aggressively.
  • Use logprobs for classification. For binary or small-set classification, request logprobs: true, top_logprobs: 5 and classify by the highest probability token — much cheaper than full completion.
// Model routing by task complexity
function selectModel(taskComplexity: 'simple' | 'moderate' | 'complex') {
  const models = {
    simple:   'gpt-4o-mini',          // $0.15/1M tokens
    moderate: 'gpt-4o',               // $2.50/1M tokens
    complex:  'o3-mini',              // deep reasoning
  };
  return models[taskComplexity];
}

Streaming in Next.js App Router

// app/api/chat/route.ts
import OpenAI from 'openai';
import { OpenAIStream, StreamingTextResponse } from 'ai';

const openai = new OpenAI();

export async function POST(req: Request) {
  const { messages } = await req.json();

  const response = await openai.chat.completions.create({
    model:    'gpt-4o',
    messages,
    stream:   true,
    max_tokens: 1024,
  });

  const stream = OpenAIStream(response);
  return new StreamingTextResponse(stream);
}

Summary

  • CoT prompting + low temperature = dramatically better reasoning
  • Function calling: define schemas, run the tool loop, return results
  • Structured Outputs with Zod: type-safe, schema-guaranteed responses — no parse errors
  • Route to GPT-4o-mini for simple tasks (15× cheaper, same quality on easy problems)
  • Use Batch API for offline workloads — 50% cost reduction
  • Stream all user-facing completions for perceived-latency improvements

Stay Updated

Enjoyed this article?

Deep-dive articles on AI, React, and software craft — twice a month. No spam, ever.