GPT-4o is OpenAI's flagship model — faster, cheaper, and multimodal. But calling chat.completions.create with a plain user message is leaving a huge amount of capability on the table. Once you understand function calling, structured outputs, and advanced prompting techniques, GPT-4o becomes a programmable reasoning engine rather than a chatbot.
This guide covers everything from zero-shot vs. few-shot prompting, chain-of-thought techniques, the modern function calling API, JSON mode, and hard-won cost optimisation tricks.
Advanced Prompting Patterns
Chain-of-Thought (CoT)
Simply appending "Think step by step." to your prompt dramatically improves accuracy on reasoning tasks. For even better results, use few-shot CoT: provide 2-3 examples of the reasoning chain before the actual question.
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'system',
content: `You are a precise analyst. When asked a question, always:
1. Break the problem into steps
2. Show your reasoning for each step
3. State your final answer clearly`
},
{
role: 'user',
content: 'Should we migrate our monolith to microservices? We have 8 engineers and 50K MAU.'
}
],
temperature: 0.3, // lower = more consistent reasoning
});
Role Prompting
// System prompt persona engineering
const systemPrompt = `You are a senior software architect with 15 years of experience
at Google and Netflix. You give direct, opinionated advice based on real production
experience. You cite specific technologies and trade-offs. You never say "it depends"
without explaining what it depends *on* and why.`;
Output Format Control (without JSON mode)
// Constrain output format via prompt
const prompt = `Analyse this code for security vulnerabilities.
Return your response in EXACTLY this format:
SEVERITY: [Critical|High|Medium|Low]
VULNERABILITY: [name]
LOCATION: [file:line]
DESCRIPTION: [2 sentences max]
FIX: [specific code fix]
Code to analyse:
${userCode}`;
Function Calling (Tools API)
Function calling lets GPT-4o invoke your code. You define tool schemas; the model decides when and how to call them based on the conversation. The response contains structured arguments you validate and execute.
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const tools = [
{
type: 'function',
function: {
name: 'get_stock_price',
description: 'Get the current stock price for a ticker symbol.',
parameters: {
type: 'object',
properties: {
ticker: {
type: 'string',
description: 'Stock ticker, e.g. AAPL, MSFT',
},
currency: {
type: 'string',
enum: ['USD', 'EUR', 'GBP'],
description: 'Return price in this currency',
},
},
required: ['ticker'],
additionalProperties: false,
},
},
},
];
const messages = [{ role: 'user', content: 'What is Apple stock trading at right now?' }];
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages,
tools,
tool_choice: 'auto', // 'none' | 'auto' | {type:'function',function:{name:...}}
});
const msg = response.choices[0].message;
if (msg.tool_calls) {
for (const call of msg.tool_calls) {
const args = JSON.parse(call.function.arguments);
const result = await fetchStockPrice(args.ticker, args.currency ?? 'USD');
messages.push(msg); // assistant message with tool call
messages.push({
role: 'tool',
tool_call_id: call.id,
content: JSON.stringify(result),
});
}
// Final response after tool execution
const final = await openai.chat.completions.create({ model: 'gpt-4o', messages, tools });
console.log(final.choices[0].message.content);
}
Structured Outputs (2024+)
Structured Outputs guarantee the model's response matches your JSON schema exactly — eliminating parse errors. Pass response_format with type: 'json_schema'.
import { zodResponseFormat } from 'openai/helpers/zod';
import { z } from 'zod';
const BugReport = z.object({
severity: z.enum(['critical', 'high', 'medium', 'low']),
title: z.string(),
description: z.string(),
steps: z.array(z.string()),
fix: z.string().optional(),
});
const response = await openai.beta.chat.completions.parse({
model: 'gpt-4o-2024-08-06', // structured outputs requires this version or later
messages: [
{ role: 'system', content: 'You are a QA engineer. Analyse bug reports.' },
{ role: 'user', content: 'The login button disappears on mobile after form validation fails.' },
],
response_format: zodResponseFormat(BugReport, 'bug_report'),
});
const bug = response.choices[0].message.parsed;
// TypeScript knows the exact shape — no casting needed
console.log(bug.severity, bug.title, bug.steps);
Structured Outputs vs. JSON mode: JSON mode just guarantees valid JSON. Structured Outputs guarantee your exact schema. Always prefer Structured Outputs for production.
100%
Schema compliance
0
Parse errors in prod
~15%
Latency overhead
Cost Optimisation Patterns
- Use GPT-4o-mini for simple tasks. It costs 15× less than GPT-4o and handles classification, extraction, summarisation, and simple Q&A just as well.
- Cache deterministic prompts. The Prompt Cache API reduces cost by up to 50% for repeated system prompts + prefixes. Keep your system prompt consistent across requests.
- Limit
max_tokens. You pay for output tokens. If answers should be short, setmax_tokensto 200. Don't leave it unbounded. - Batch API for offline workloads. The Batch API is 50% cheaper for non-real-time tasks (document processing, bulk extraction). Use it aggressively.
- Use logprobs for classification. For binary or small-set classification, request
logprobs: true, top_logprobs: 5and classify by the highest probability token — much cheaper than full completion.
// Model routing by task complexity
function selectModel(taskComplexity: 'simple' | 'moderate' | 'complex') {
const models = {
simple: 'gpt-4o-mini', // $0.15/1M tokens
moderate: 'gpt-4o', // $2.50/1M tokens
complex: 'o3-mini', // deep reasoning
};
return models[taskComplexity];
}
Streaming in Next.js App Router
// app/api/chat/route.ts
import OpenAI from 'openai';
import { OpenAIStream, StreamingTextResponse } from 'ai';
const openai = new OpenAI();
export async function POST(req: Request) {
const { messages } = await req.json();
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages,
stream: true,
max_tokens: 1024,
});
const stream = OpenAIStream(response);
return new StreamingTextResponse(stream);
}
Summary
- CoT prompting + low temperature = dramatically better reasoning
- Function calling: define schemas, run the tool loop, return results
- Structured Outputs with Zod: type-safe, schema-guaranteed responses — no parse errors
- Route to GPT-4o-mini for simple tasks (15× cheaper, same quality on easy problems)
- Use Batch API for offline workloads — 50% cost reduction
- Stream all user-facing completions for perceived-latency improvements