Responses API

POST /v1/responses

Create a model response. Compatible with the OpenAI Responses API format. Supports streaming via Server-Sent Events (SSE).

Codex Compatible — This endpoint implements the subset of the OpenAI Responses API required by Codex CLI and the OpenAI SDK's client.responses.create() method.

Request Body

Field	Type		Description
model	string	Required	Model ID (e.g., `openai/gpt-5.3-codex`, `openai/gpt-4.1`)
input	string \| array	Required	A text string or an array of input items. See Input Items below.
instructions	string	Optional	System instructions for the model (equivalent to a system message).
stream	boolean	Optional	If `true`, returns a stream of Server-Sent Events.
max_output_tokens	integer	Optional	Maximum number of output tokens to generate.
temperature	number	Optional	Sampling temperature (0–2).
top_p	number	Optional	Nucleus sampling parameter.
tools	array	Optional	Array of tool definitions. Supports `function` type.
tool_choice	string \| object	Optional	Controls tool use: `"auto"`, `"none"`, `"required"`, or a specific function object.
parallel_tool_calls	boolean	Optional	Allow multiple tool calls in a single response turn.
text	object	Optional	Output text format configuration. Supports `json_schema` for structured output.
reasoning	object	Optional	Reasoning configuration: `{"effort": "low"\|"medium"\|"high", "summary": "auto"\|"concise"\|"detailed"}`
include	array	Optional	Additional data to include, e.g. `["reasoning.encrypted_content"]`.
metadata	object	Optional	Key-value metadata to attach to the response.

Input Items

The input field accepts either a plain string (shorthand for a single user message) or an array of typed items:

Type	Description
message	A conversation message with `role` (`user`, `assistant`, `system`, `developer`) and `content` (string or array of content parts).
function_call	A previous function call made by the model. Fields: `call_id`, `name`, `arguments`.
function_call_output	The result of a function call. Fields: `call_id`, `output`.
reasoning	A reasoning trace from a previous turn (for multi-turn reasoning continuity).

Examples

Non-Streaming Request

curl https://api.pomex.ai/v1/responses \
  -H "Authorization: Bearer $POMEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.3-codex",
    "input": "Explain the concept of prompt caching in 2 sentences."
  }'

import requests

response = requests.post(
    "https://api.pomex.ai/v1/responses",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "openai/gpt-5.3-codex",
        "input": "Explain the concept of prompt caching in 2 sentences.",
    },
)
print(response.json())

from openai import OpenAI

client = OpenAI(
    base_url="https://api.pomex.ai/v1",
    api_key="YOUR_API_KEY",
)

response = client.responses.create(
    model="openai/gpt-5.3-codex",
    input="Explain the concept of prompt caching in 2 sentences.",
)
print(response.output_text)

Non-Streaming Response

{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1714900000,
  "model": "openai/gpt-5.3-codex",
  "status": "completed",
  "output": [
    {
      "type": "message",
      "id": "msg_001",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Prompt caching stores previously computed key-value attention states so that repeated prefixes don't need to be reprocessed on every request. This dramatically reduces latency and cost for applications that send the same system prompt or context across multiple calls."
        }
      ],
      "status": "completed"
    }
  ],
  "output_text": "Prompt caching stores previously computed key-value attention states so that repeated prefixes don't need to be reprocessed on every request. This dramatically reduces latency and cost for applications that send the same system prompt or context across multiple calls.",
  "usage": {
    "input_tokens": 18,
    "output_tokens": 52,
    "total_tokens": 70,
    "input_tokens_details": { "cached_tokens": 0 },
    "output_tokens_details": { "reasoning_tokens": 0 }
  }
}

Streaming

Set "stream": true to receive responses as Server-Sent Events. Each event has a typed event: line and a JSON data: payload containing a "type" field matching the event name.

Streaming Request

curl https://api.pomex.ai/v1/responses \
  -H "Authorization: Bearer $POMEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.3-codex",
    "stream": true,
    "input": "Write a haiku about APIs."
  }'

from openai import OpenAI

client = OpenAI(
    base_url="https://api.pomex.ai/v1",
    api_key="YOUR_API_KEY",
)

stream = client.responses.create(
    model="openai/gpt-5.3-codex",
    input="Write a haiku about APIs.",
    stream=True,
)
for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

SSE Event Types

Event	Description
response.created	Stream started. Contains the initial response object with `status: "in_progress"`.
response.output_item.added	A new output item (message, function_call, reasoning) was added.
response.content_part.added	A new content part was added within a message output item.
response.output_text.delta	Incremental text chunk. Contains `"delta"` field with the text fragment.
response.output_text.done	Text generation for this content part is complete.
response.content_part.done	Content part is finalized.
response.output_item.done	Output item is finalized.
response.completed	Stream finished successfully. Contains the final response object.
response.incomplete	Stream ended due to max tokens or content filter.
response.failed	Stream ended due to an error.

Example SSE Stream

event: response.created
data: {"type":"response.created","response":{"id":"resp_abc123","object":"response","status":"in_progress","output":[],"model":"openai/gpt-5.3-codex"}}

event: response.output_item.added
data: {"type":"response.output_item.added","output_index":0,"item":{"type":"message","id":"msg_001","role":"assistant","content":[],"status":"in_progress"}}

event: response.content_part.added
data: {"type":"response.content_part.added","output_index":0,"content_index":0,"part":{"type":"output_text","text":""}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","output_index":0,"content_index":0,"delta":"Data"}

event: response.output_text.delta
data: {"type":"response.output_text.delta","output_index":0,"content_index":0,"delta":" flows"}

event: response.output_text.done
data: {"type":"response.output_text.done","output_index":0,"content_index":0,"text":"Data flows through the wire\nEndpoints hum in harmony\nJSON speaks the truth"}

event: response.content_part.done
data: {"type":"response.content_part.done","output_index":0,"content_index":0,"part":{"type":"output_text","text":"Data flows through the wire\nEndpoints hum in harmony\nJSON speaks the truth"}}

event: response.output_item.done
data: {"type":"response.output_item.done","output_index":0,"item":{"type":"message","id":"msg_001","role":"assistant","content":[{"type":"output_text","text":"Data flows through the wire\nEndpoints hum in harmony\nJSON speaks the truth"}],"status":"completed"}}

event: response.completed
data: {"type":"response.completed","response":{"id":"resp_abc123","object":"response","status":"completed","output":[...],"usage":{"input_tokens":12,"output_tokens":20,"total_tokens":32}}}

Tool Calling

Define functions in the tools array. The model may respond with function_call output items. Provide results via function_call_output input items in the next request.

{
  "model": "openai/gpt-5.3-codex",
  "input": [
    {"type": "message", "role": "user", "content": "What's the weather in Tokyo?"}
  ],
  "tools": [
    {
      "type": "function",
      "name": "get_weather",
      "description": "Get the current weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {"type": "string", "description": "City name"}
        },
        "required": ["location"]
      }
    }
  ]
}

Multi-Turn with Tool Results

{
  "model": "openai/gpt-5.3-codex",
  "input": [
    {"type": "message", "role": "user", "content": "What's the weather in Tokyo?"},
    {"type": "function_call", "call_id": "call_xyz", "name": "get_weather", "arguments": "{\"location\":\"Tokyo\"}"},
    {"type": "function_call_output", "call_id": "call_xyz", "output": "{\"temp\":22,\"condition\":\"sunny\"}"}
  ],
  "tools": [
    {
      "type": "function",
      "name": "get_weather",
      "description": "Get the current weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {"type": "string"}
        },
        "required": ["location"]
      }
    }
  ]
}

Structured Output

Use the text.format field to request JSON output conforming to a schema:

{
  "model": "openai/gpt-5.3-codex",
  "input": "List 3 programming languages with their year of creation.",
  "text": {
    "format": {
      "type": "json_schema",
      "name": "languages",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "languages": {
            "type": "array",
            "items": {
              "type": "object",
              "properties": {
                "name": {"type": "string"},
                "year": {"type": "integer"}
              },
              "required": ["name", "year"],
              "additionalProperties": false
            }
          }
        },
        "required": ["languages"],
        "additionalProperties": false
      }
    }
  }
}

Reasoning

Control the model's reasoning behavior with the reasoning parameter:

{
  "model": "openai/gpt-5.3-codex",
  "input": "Prove that the square root of 2 is irrational.",
  "reasoning": {
    "effort": "high",
    "summary": "auto"
  },
  "include": ["reasoning.encrypted_content"]
}

When include contains "reasoning.encrypted_content", reasoning output items will include an encrypted_content field that can be passed back in subsequent requests for continuity.

Supported Models

Currently, only OpenAI GPT models are supported on the Responses API:

Provider	Models
OpenAI	`openai/gpt-5.3-codex`, `openai/gpt-4.1`, `openai/gpt-4.1-mini`, `openai/o3`, `openai/o4-mini`

Note: Claude and Gemini models are not supported on /v1/responses. Use the appropriate native API endpoints instead. Image-only models are also not supported on this endpoint.

Limitations

previous_response_id is not supported — pass full conversation history in input instead.
background and truncation parameters are not supported.
web_search tool type is only supported when using native OpenAI models via the GPT Responses passthrough.
File and image inputs are supported via input_image content parts with image_url.