Responses API

POST /v1/responses

Create a model response. Compatible with the OpenAI Responses API format. Supports streaming via Server-Sent Events (SSE).

Codex Compatible — This endpoint implements the subset of the OpenAI Responses API required by Codex CLI and the OpenAI SDK's client.responses.create() method.

Request Body

Field Type Description
model string Required Model ID (e.g., openai/gpt-5.3-codex, openai/gpt-4.1)
input string | array Required A text string or an array of input items. See Input Items below.
instructions string Optional System instructions for the model (equivalent to a system message).
stream boolean Optional If true, returns a stream of Server-Sent Events.
max_output_tokens integer Optional Maximum number of output tokens to generate.
temperature number Optional Sampling temperature (0–2).
top_p number Optional Nucleus sampling parameter.
tools array Optional Array of tool definitions. Supports function type.
tool_choice string | object Optional Controls tool use: "auto", "none", "required", or a specific function object.
parallel_tool_calls boolean Optional Allow multiple tool calls in a single response turn.
text object Optional Output text format configuration. Supports json_schema for structured output.
reasoning object Optional Reasoning configuration: {"effort": "low"|"medium"|"high", "summary": "auto"|"concise"|"detailed"}
include array Optional Additional data to include, e.g. ["reasoning.encrypted_content"].
metadata object Optional Key-value metadata to attach to the response.

Input Items

The input field accepts either a plain string (shorthand for a single user message) or an array of typed items:

Type Description
message A conversation message with role (user, assistant, system, developer) and content (string or array of content parts).
function_call A previous function call made by the model. Fields: call_id, name, arguments.
function_call_output The result of a function call. Fields: call_id, output.
reasoning A reasoning trace from a previous turn (for multi-turn reasoning continuity).

Examples

Non-Streaming Request

curl https://api.pomex.ai/v1/responses \
  -H "Authorization: Bearer $POMEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.3-codex",
    "input": "Explain the concept of prompt caching in 2 sentences."
  }'
import requests

response = requests.post(
    "https://api.pomex.ai/v1/responses",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "openai/gpt-5.3-codex",
        "input": "Explain the concept of prompt caching in 2 sentences.",
    },
)
print(response.json())
from openai import OpenAI

client = OpenAI(
    base_url="https://api.pomex.ai/v1",
    api_key="YOUR_API_KEY",
)

response = client.responses.create(
    model="openai/gpt-5.3-codex",
    input="Explain the concept of prompt caching in 2 sentences.",
)
print(response.output_text)

Non-Streaming Response

{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1714900000,
  "model": "openai/gpt-5.3-codex",
  "status": "completed",
  "output": [
    {
      "type": "message",
      "id": "msg_001",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Prompt caching stores previously computed key-value attention states so that repeated prefixes don't need to be reprocessed on every request. This dramatically reduces latency and cost for applications that send the same system prompt or context across multiple calls."
        }
      ],
      "status": "completed"
    }
  ],
  "output_text": "Prompt caching stores previously computed key-value attention states so that repeated prefixes don't need to be reprocessed on every request. This dramatically reduces latency and cost for applications that send the same system prompt or context across multiple calls.",
  "usage": {
    "input_tokens": 18,
    "output_tokens": 52,
    "total_tokens": 70,
    "input_tokens_details": { "cached_tokens": 0 },
    "output_tokens_details": { "reasoning_tokens": 0 }
  }
}

Streaming

Set "stream": true to receive responses as Server-Sent Events. Each event has a typed event: line and a JSON data: payload containing a "type" field matching the event name.

Streaming Request

curl https://api.pomex.ai/v1/responses \
  -H "Authorization: Bearer $POMEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.3-codex",
    "stream": true,
    "input": "Write a haiku about APIs."
  }'
from openai import OpenAI

client = OpenAI(
    base_url="https://api.pomex.ai/v1",
    api_key="YOUR_API_KEY",
)

stream = client.responses.create(
    model="openai/gpt-5.3-codex",
    input="Write a haiku about APIs.",
    stream=True,
)
for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

SSE Event Types

Event Description
response.created Stream started. Contains the initial response object with status: "in_progress".
response.output_item.added A new output item (message, function_call, reasoning) was added.
response.content_part.added A new content part was added within a message output item.
response.output_text.delta Incremental text chunk. Contains "delta" field with the text fragment.
response.output_text.done Text generation for this content part is complete.
response.content_part.done Content part is finalized.
response.output_item.done Output item is finalized.
response.completed Stream finished successfully. Contains the final response object.
response.incomplete Stream ended due to max tokens or content filter.
response.failed Stream ended due to an error.

Example SSE Stream

event: response.created
data: {"type":"response.created","response":{"id":"resp_abc123","object":"response","status":"in_progress","output":[],"model":"openai/gpt-5.3-codex"}}

event: response.output_item.added
data: {"type":"response.output_item.added","output_index":0,"item":{"type":"message","id":"msg_001","role":"assistant","content":[],"status":"in_progress"}}

event: response.content_part.added
data: {"type":"response.content_part.added","output_index":0,"content_index":0,"part":{"type":"output_text","text":""}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","output_index":0,"content_index":0,"delta":"Data"}

event: response.output_text.delta
data: {"type":"response.output_text.delta","output_index":0,"content_index":0,"delta":" flows"}

event: response.output_text.done
data: {"type":"response.output_text.done","output_index":0,"content_index":0,"text":"Data flows through the wire\nEndpoints hum in harmony\nJSON speaks the truth"}

event: response.content_part.done
data: {"type":"response.content_part.done","output_index":0,"content_index":0,"part":{"type":"output_text","text":"Data flows through the wire\nEndpoints hum in harmony\nJSON speaks the truth"}}

event: response.output_item.done
data: {"type":"response.output_item.done","output_index":0,"item":{"type":"message","id":"msg_001","role":"assistant","content":[{"type":"output_text","text":"Data flows through the wire\nEndpoints hum in harmony\nJSON speaks the truth"}],"status":"completed"}}

event: response.completed
data: {"type":"response.completed","response":{"id":"resp_abc123","object":"response","status":"completed","output":[...],"usage":{"input_tokens":12,"output_tokens":20,"total_tokens":32}}}

Tool Calling

Define functions in the tools array. The model may respond with function_call output items. Provide results via function_call_output input items in the next request.

{
  "model": "openai/gpt-5.3-codex",
  "input": [
    {"type": "message", "role": "user", "content": "What's the weather in Tokyo?"}
  ],
  "tools": [
    {
      "type": "function",
      "name": "get_weather",
      "description": "Get the current weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {"type": "string", "description": "City name"}
        },
        "required": ["location"]
      }
    }
  ]
}

Multi-Turn with Tool Results

{
  "model": "openai/gpt-5.3-codex",
  "input": [
    {"type": "message", "role": "user", "content": "What's the weather in Tokyo?"},
    {"type": "function_call", "call_id": "call_xyz", "name": "get_weather", "arguments": "{\"location\":\"Tokyo\"}"},
    {"type": "function_call_output", "call_id": "call_xyz", "output": "{\"temp\":22,\"condition\":\"sunny\"}"}
  ],
  "tools": [
    {
      "type": "function",
      "name": "get_weather",
      "description": "Get the current weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {"type": "string"}
        },
        "required": ["location"]
      }
    }
  ]
}

Structured Output

Use the text.format field to request JSON output conforming to a schema:

{
  "model": "openai/gpt-5.3-codex",
  "input": "List 3 programming languages with their year of creation.",
  "text": {
    "format": {
      "type": "json_schema",
      "name": "languages",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "languages": {
            "type": "array",
            "items": {
              "type": "object",
              "properties": {
                "name": {"type": "string"},
                "year": {"type": "integer"}
              },
              "required": ["name", "year"],
              "additionalProperties": false
            }
          }
        },
        "required": ["languages"],
        "additionalProperties": false
      }
    }
  }
}

Reasoning

Control the model's reasoning behavior with the reasoning parameter:

{
  "model": "openai/gpt-5.3-codex",
  "input": "Prove that the square root of 2 is irrational.",
  "reasoning": {
    "effort": "high",
    "summary": "auto"
  },
  "include": ["reasoning.encrypted_content"]
}

When include contains "reasoning.encrypted_content", reasoning output items will include an encrypted_content field that can be passed back in subsequent requests for continuity.


Supported Models

Currently, only OpenAI GPT models are supported on the Responses API:

Provider Models
OpenAI openai/gpt-5.3-codex, openai/gpt-4.1, openai/gpt-4.1-mini, openai/o3, openai/o4-mini
Note: Claude and Gemini models are not supported on /v1/responses. Use the appropriate native API endpoints instead. Image-only models are also not supported on this endpoint.

Limitations