Responses API
Create a model response. Compatible with the OpenAI Responses API format. Supports streaming via Server-Sent Events (SSE).
client.responses.create() method.
Request Body
| Field | Type | Description | |
|---|---|---|---|
| model | string | Required | Model ID (e.g., openai/gpt-5.3-codex, openai/gpt-4.1) |
| input | string | array | Required | A text string or an array of input items. See Input Items below. |
| instructions | string | Optional | System instructions for the model (equivalent to a system message). |
| stream | boolean | Optional | If true, returns a stream of Server-Sent Events. |
| max_output_tokens | integer | Optional | Maximum number of output tokens to generate. |
| temperature | number | Optional | Sampling temperature (0–2). |
| top_p | number | Optional | Nucleus sampling parameter. |
| tools | array | Optional | Array of tool definitions. Supports function type. |
| tool_choice | string | object | Optional | Controls tool use: "auto", "none", "required", or a specific function object. |
| parallel_tool_calls | boolean | Optional | Allow multiple tool calls in a single response turn. |
| text | object | Optional | Output text format configuration. Supports json_schema for structured output. |
| reasoning | object | Optional | Reasoning configuration: {"effort": "low"|"medium"|"high", "summary": "auto"|"concise"|"detailed"} |
| include | array | Optional | Additional data to include, e.g. ["reasoning.encrypted_content"]. |
| metadata | object | Optional | Key-value metadata to attach to the response. |
Input Items
The input field accepts either a plain string (shorthand for a single user message) or an array of typed items:
| Type | Description |
|---|---|
| message | A conversation message with role (user, assistant, system, developer) and content (string or array of content parts). |
| function_call | A previous function call made by the model. Fields: call_id, name, arguments. |
| function_call_output | The result of a function call. Fields: call_id, output. |
| reasoning | A reasoning trace from a previous turn (for multi-turn reasoning continuity). |
Examples
Non-Streaming Request
curl https://api.pomex.ai/v1/responses \
-H "Authorization: Bearer $POMEX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5.3-codex",
"input": "Explain the concept of prompt caching in 2 sentences."
}'import requests
response = requests.post(
"https://api.pomex.ai/v1/responses",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "openai/gpt-5.3-codex",
"input": "Explain the concept of prompt caching in 2 sentences.",
},
)
print(response.json())from openai import OpenAI
client = OpenAI(
base_url="https://api.pomex.ai/v1",
api_key="YOUR_API_KEY",
)
response = client.responses.create(
model="openai/gpt-5.3-codex",
input="Explain the concept of prompt caching in 2 sentences.",
)
print(response.output_text)Non-Streaming Response
{
"id": "resp_abc123",
"object": "response",
"created_at": 1714900000,
"model": "openai/gpt-5.3-codex",
"status": "completed",
"output": [
{
"type": "message",
"id": "msg_001",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "Prompt caching stores previously computed key-value attention states so that repeated prefixes don't need to be reprocessed on every request. This dramatically reduces latency and cost for applications that send the same system prompt or context across multiple calls."
}
],
"status": "completed"
}
],
"output_text": "Prompt caching stores previously computed key-value attention states so that repeated prefixes don't need to be reprocessed on every request. This dramatically reduces latency and cost for applications that send the same system prompt or context across multiple calls.",
"usage": {
"input_tokens": 18,
"output_tokens": 52,
"total_tokens": 70,
"input_tokens_details": { "cached_tokens": 0 },
"output_tokens_details": { "reasoning_tokens": 0 }
}
}Streaming
Set "stream": true to receive responses as Server-Sent Events. Each event has a typed event: line and a JSON data: payload containing a "type" field matching the event name.
Streaming Request
curl https://api.pomex.ai/v1/responses \
-H "Authorization: Bearer $POMEX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5.3-codex",
"stream": true,
"input": "Write a haiku about APIs."
}'from openai import OpenAI
client = OpenAI(
base_url="https://api.pomex.ai/v1",
api_key="YOUR_API_KEY",
)
stream = client.responses.create(
model="openai/gpt-5.3-codex",
input="Write a haiku about APIs.",
stream=True,
)
for event in stream:
if event.type == "response.output_text.delta":
print(event.delta, end="", flush=True)SSE Event Types
| Event | Description |
|---|---|
| response.created | Stream started. Contains the initial response object with status: "in_progress". |
| response.output_item.added | A new output item (message, function_call, reasoning) was added. |
| response.content_part.added | A new content part was added within a message output item. |
| response.output_text.delta | Incremental text chunk. Contains "delta" field with the text fragment. |
| response.output_text.done | Text generation for this content part is complete. |
| response.content_part.done | Content part is finalized. |
| response.output_item.done | Output item is finalized. |
| response.completed | Stream finished successfully. Contains the final response object. |
| response.incomplete | Stream ended due to max tokens or content filter. |
| response.failed | Stream ended due to an error. |
Example SSE Stream
event: response.created
data: {"type":"response.created","response":{"id":"resp_abc123","object":"response","status":"in_progress","output":[],"model":"openai/gpt-5.3-codex"}}
event: response.output_item.added
data: {"type":"response.output_item.added","output_index":0,"item":{"type":"message","id":"msg_001","role":"assistant","content":[],"status":"in_progress"}}
event: response.content_part.added
data: {"type":"response.content_part.added","output_index":0,"content_index":0,"part":{"type":"output_text","text":""}}
event: response.output_text.delta
data: {"type":"response.output_text.delta","output_index":0,"content_index":0,"delta":"Data"}
event: response.output_text.delta
data: {"type":"response.output_text.delta","output_index":0,"content_index":0,"delta":" flows"}
event: response.output_text.done
data: {"type":"response.output_text.done","output_index":0,"content_index":0,"text":"Data flows through the wire\nEndpoints hum in harmony\nJSON speaks the truth"}
event: response.content_part.done
data: {"type":"response.content_part.done","output_index":0,"content_index":0,"part":{"type":"output_text","text":"Data flows through the wire\nEndpoints hum in harmony\nJSON speaks the truth"}}
event: response.output_item.done
data: {"type":"response.output_item.done","output_index":0,"item":{"type":"message","id":"msg_001","role":"assistant","content":[{"type":"output_text","text":"Data flows through the wire\nEndpoints hum in harmony\nJSON speaks the truth"}],"status":"completed"}}
event: response.completed
data: {"type":"response.completed","response":{"id":"resp_abc123","object":"response","status":"completed","output":[...],"usage":{"input_tokens":12,"output_tokens":20,"total_tokens":32}}}Tool Calling
Define functions in the tools array. The model may respond with function_call output items. Provide results via function_call_output input items in the next request.
{
"model": "openai/gpt-5.3-codex",
"input": [
{"type": "message", "role": "user", "content": "What's the weather in Tokyo?"}
],
"tools": [
{
"type": "function",
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
]
}Multi-Turn with Tool Results
{
"model": "openai/gpt-5.3-codex",
"input": [
{"type": "message", "role": "user", "content": "What's the weather in Tokyo?"},
{"type": "function_call", "call_id": "call_xyz", "name": "get_weather", "arguments": "{\"location\":\"Tokyo\"}"},
{"type": "function_call_output", "call_id": "call_xyz", "output": "{\"temp\":22,\"condition\":\"sunny\"}"}
],
"tools": [
{
"type": "function",
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
]
}Structured Output
Use the text.format field to request JSON output conforming to a schema:
{
"model": "openai/gpt-5.3-codex",
"input": "List 3 programming languages with their year of creation.",
"text": {
"format": {
"type": "json_schema",
"name": "languages",
"strict": true,
"schema": {
"type": "object",
"properties": {
"languages": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"year": {"type": "integer"}
},
"required": ["name", "year"],
"additionalProperties": false
}
}
},
"required": ["languages"],
"additionalProperties": false
}
}
}
}Reasoning
Control the model's reasoning behavior with the reasoning parameter:
{
"model": "openai/gpt-5.3-codex",
"input": "Prove that the square root of 2 is irrational.",
"reasoning": {
"effort": "high",
"summary": "auto"
},
"include": ["reasoning.encrypted_content"]
}When include contains "reasoning.encrypted_content", reasoning output items will include an encrypted_content field that can be passed back in subsequent requests for continuity.
Supported Models
Currently, only OpenAI GPT models are supported on the Responses API:
| Provider | Models |
|---|---|
| OpenAI | openai/gpt-5.3-codex, openai/gpt-4.1, openai/gpt-4.1-mini, openai/o3, openai/o4-mini |
/v1/responses. Use the appropriate native API endpoints instead. Image-only models are also not supported on this endpoint.
Limitations
previous_response_idis not supported — pass full conversation history ininputinstead.backgroundandtruncationparameters are not supported.web_searchtool type is only supported when using native OpenAI models via the GPT Responses passthrough.- File and image inputs are supported via
input_imagecontent parts withimage_url.