Image Generation

Pomex supports image generation through three API interfaces on this page:

Interface Endpoint Request Shape
Chat Completions POST /v1/chat/completions OpenAI Chat format — works with OpenAI SDK directly
Gemini Native POST /v1beta/models/{model}:generateContent Google contents / parts format
OpenAI Images POST /v1/images/generations OpenAI-style model / prompt / size JSON

Chat Completions Interface

POST /v1/chat/completions

Generate images using the standard OpenAI Chat Completions endpoint. This is the recommended approach if you already use the OpenAI SDK—no code changes needed beyond switching the model name. Both GPT Image and Gemini Image models are supported.

This endpoint is non-streaming only for image models. Setting stream: true will return an error. Features like tools, response_format, reasoning, and logprobs are not supported for image generation.

Available Models

Model ID Description
openai/gpt-image-1 OpenAI GPT Image 1
openai/gpt-image-1.5 OpenAI GPT Image 1.5
openai/gpt-image-2 OpenAI GPT Image 2
google/gemini-2.5-flash-image Gemini 2.5 Flash with image generation
google/gemini-3-pro-image-preview Gemini 3 Pro image generation preview
google/gemini-3.1-flash-image-preview Gemini 3.1 Flash image generation preview

Request Body

Field Type Description
model string Required An image model ID from the table above.
messages array Required Array of message objects. The last user message's text content is used as the image prompt. Content can be a plain string or an array of content parts.
n number Optional Number of images to generate (default: 1). Only applies to GPT Image models.

Response Format

The response follows the standard Chat Completions shape. Generated images are returned as image_url content parts with base64 data URIs:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1735689600,
  "model": "openai/gpt-image-2",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": [
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/png;base64,iVBORw0KGgo..."
            }
          }
        ]
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 1290,
    "total_tokens": 1302
  }
}

Examples — GPT Image

# Generate an image and save it to a file
curl https://api.pomex.ai/v1/chat/completions \
  -H "Authorization: Bearer $POMEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-image-2",
    "messages": [
      {"role": "user", "content": "A cute baby otter wearing a tiny top hat, watercolor style"}
    ]
  }' | jq -r '.choices[0].message.content[0].image_url.url' \
     | sed 's/^data:image\/png;base64,//' \
     | base64 -d > otter.png
import base64
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.pomex.ai/v1",
)

response = client.chat.completions.create(
    model="openai/gpt-image-2",
    messages=[
        {"role": "user", "content": "A cute baby otter wearing a tiny top hat, watercolor style"}
    ],
)

# Find the image part in the response (may be preceded by text)
for part in response.choices[0].message.content:
    if part["type"] == "image_url":
        img_b64 = part["image_url"]["url"].replace("data:image/png;base64,", "")
        img_bytes = base64.b64decode(img_b64)
        with open("otter.png", "wb") as f:
            f.write(img_bytes)
        print("Saved otter.png")
        break

Examples — Gemini Image

# Generate an image with Gemini and save it
curl https://api.pomex.ai/v1/chat/completions \
  -H "Authorization: Bearer $POMEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-3.1-flash-image-preview",
    "messages": [
      {"role": "user", "content": "A serene mountain landscape at sunset, oil painting style"}
    ]
  }' | jq -r '.choices[0].message.content[0].image_url.url' \
     | sed 's/^data:image\/png;base64,//' \
     | base64 -d > landscape.png
import base64
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.pomex.ai/v1",
)

response = client.chat.completions.create(
    model="google/gemini-3.1-flash-image-preview",
    messages=[
        {"role": "user", "content": "A serene mountain landscape at sunset, oil painting style"}
    ],
)

# Find the image part in the response (may be preceded by text)
for part in response.choices[0].message.content:
    if part["type"] == "image_url":
        img_b64 = part["image_url"]["url"].replace("data:image/png;base64,", "")
        img_bytes = base64.b64decode(img_b64)
        with open("landscape.png", "wb") as f:
            f.write(img_bytes)
        print("Saved landscape.png")
        break

Gemini Native Interface

POST /v1beta/models/{model}:generateContent

Generate images using Google's Gemini image generation models. This endpoint uses the native Gemini generateContent API format—not the OpenAI Images format.

This endpoint is non-streaming only. The response is returned as a single JSON object once generation is complete.

For text-only Gemini requests (:generateContent on any text model, plus :streamGenerateContent) see Generate Content. The same URL pattern serves both flows — Pomex selects the handler based on the resolved model class.


Available Gemini Models

Model ID Description
google/gemini-2.5-flash-image Gemini 2.5 Flash with image generation
google/gemini-3-pro-image-preview Gemini 3 Pro image generation preview
google/gemini-3.1-flash-image-preview Gemini 3.1 Flash image generation preview

The google/ prefix is optional in the URL. For example, both /v1beta/models/gemini-2.5-flash-image:generateContent and /v1beta/models/google/gemini-2.5-flash-image:generateContent work.


Authentication

This endpoint supports two authentication methods:

Method Header Example
Bearer token Authorization Authorization: Bearer rh_your_api_key
Google-style API key x-goog-api-key x-goog-api-key: rh_your_api_key

Request Body

Field Type Description
contents array Required Array of Content objects with role and parts. See Content Object.
generationConfig object Optional Generation parameters (temperature, responseModalities, etc.). Defaults to responseModalities: ["TEXT", "IMAGE"] if omitted.
safetySettings array Optional Safety filter thresholds for content categories.
systemInstruction object Optional System-level instruction as a Content object.
cachedContent string Optional Resource name of cached content (e.g., projects/my-project/cachedContents/abc123).

Content Object

Field Type Description
role string Required "user" or "model"
parts array Required Array of parts. Each part contains one of: text (string) or inlineData (object with mimeType and base64-encoded data).

Response Body

Field Type Description
candidates array Array of candidate responses. Each has content (with parts), finishReason, and optional safetyRatings.
modelVersion string The model version used.
usageMetadata object Token usage: promptTokenCount, candidatesTokenCount, totalTokenCount.
promptFeedback object Present if the prompt was blocked. Contains blockReason and safetyRatings.

Generated images appear as inlineData parts in the candidate's content, with mimeType (e.g., image/png) and base64-encoded data.


Examples

Text-to-Image

curl https://api.pomex.ai/v1beta/models/gemini-2.5-flash-image:generateContent \
  -H "Authorization: Bearer $POMEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{"text": "A cute baby otter wearing a tiny top hat, watercolor style"}]
      }
    ]
  }'
import requests
import base64

response = requests.post(
    "https://api.pomex.ai/v1beta/models/gemini-2.5-flash-image:generateContent",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "contents": [
            {
                "role": "user",
                "parts": [{"text": "A cute baby otter wearing a tiny top hat, watercolor style"}],
            }
        ],
    },
)

data = response.json()
for part in data["candidates"][0]["content"]["parts"]:
    if "inlineData" in part:
        img_bytes = base64.b64decode(part["inlineData"]["data"])
        with open("otter.png", "wb") as f:
            f.write(img_bytes)
        print("Saved otter.png")
    elif "text" in part:
        print(part["text"])

Image Editing

Send an existing image along with an editing instruction:

# Base64-encode your image first
IMG_B64=$(base64 -w0 photo.png)

curl https://api.pomex.ai/v1beta/models/gemini-3-pro-image-preview:generateContent \
  -H "Authorization: Bearer $POMEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [
          {"inlineData": {"mimeType": "image/png", "data": "'${IMG_B64}'"}},
          {"text": "Remove the background and replace it with a beach scene"}
        ]
      }
    ]
  }'
import requests
import base64

with open("photo.png", "rb") as f:
    img_b64 = base64.b64encode(f.read()).decode()

response = requests.post(
    "https://api.pomex.ai/v1beta/models/gemini-3-pro-image-preview:generateContent",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "contents": [
            {
                "role": "user",
                "parts": [
                    {"inlineData": {"mimeType": "image/png", "data": img_b64}},
                    {"text": "Remove the background and replace it with a beach scene"},
                ],
            }
        ],
    },
)

data = response.json()
for part in data["candidates"][0]["content"]["parts"]:
    if "inlineData" in part:
        img_bytes = base64.b64decode(part["inlineData"]["data"])
        with open("edited.png", "wb") as f:
            f.write(img_bytes)
        print("Saved edited.png")

With Generation Config

curl https://api.pomex.ai/v1beta/models/gemini-3.1-flash-image-preview:generateContent \
  -H "Authorization: Bearer $POMEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{"text": "A serene mountain landscape at sunset"}]
      }
    ],
    "generationConfig": {
      "temperature": 1.0,
      "responseModalities": ["TEXT", "IMAGE"]
    }
  }'

Sample Response

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "inlineData": {
              "mimeType": "image/png",
              "data": "iVBORw0KGgo..."
            }
          },
          {
            "text": "Here is your otter wearing a top hat!"
          }
        ]
      },
      "finishReason": "STOP"
    }
  ],
  "modelVersion": "gemini-2.5-flash-image",
  "usageMetadata": {
    "promptTokenCount": 12,
    "candidatesTokenCount": 1290,
    "totalTokenCount": 1302
  }
}

Gemini Error Format

This endpoint returns errors in Google API format, different from the OpenAI and Anthropic error formats used by other endpoints.

{
  "error": {
    "code": 400,
    "message": "body must be valid JSON",
    "status": "INVALID_ARGUMENT"
  }
}

See Errors for the full list of HTTP status codes and retry guidance.


OpenAI Images Interface

POST /v1/images/generations

Generate images using OpenAI-compatible request and response shapes. This interface is different from Gemini native :generateContent and should be called via /v1/images/generations.

Available OpenAI Image Models

Model ID Description
openai/gpt-image-1 OpenAI-compatible image generation model
openai/gpt-image-1.5 OpenAI-compatible image generation model (v1.5)
openai/gpt-image-2 OpenAI-compatible image generation model (v2)

Request Body

Field Type Description
model string Required One of openai/gpt-image-1, openai/gpt-image-1.5, or openai/gpt-image-2.
prompt string Required Text prompt describing the image to generate.
size string Optional Output resolution, such as 1024x1024.
quality string Optional Quality tier, such as medium.
output_compression number Optional Output compression level (for supported formats).
output_format string Optional Output image format, such as png or jpeg.
n number Optional Number of images to generate. Commonly 1.

Examples

This example uses openai/gpt-image-2 and returns base64 image data at data[0].b64_json.

curl -v https://api.pomex.ai/v1/images/generations \
  -H "Authorization: Bearer $POMEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
     "model": "openai/gpt-image-2",
     "prompt" : "A cute baby otter wearing a tiny top hat, watercolor style",
     "size" : "1024x1024",
     "quality" : "medium",
     "output_compression" : 100,
     "output_format" : "png",
     "n" : 1
    }' | jq -r '.data[0].b64_json' | base64 --decode > generated_image_gpt_image_2.png
import base64
import requests

response = requests.post(
    "https://api.pomex.ai/v1/images/generations",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "openai/gpt-image-2",
        "prompt": "A cute baby otter wearing a tiny top hat, watercolor style",
        "size": "1024x1024",
        "quality": "medium",
        "output_compression": 100,
        "output_format": "png",
        "n": 1,
    },
)

response.raise_for_status()
payload = response.json()

img_b64 = payload["data"][0]["b64_json"]
img_bytes = base64.b64decode(img_b64)
with open("generated_image_gpt_image_2.png", "wb") as f:
    f.write(img_bytes)

print("Saved generated_image_gpt_image_2.png")

Sample Response

{
  "created": 1735689600,
  "data": [
    {
      "b64_json": "iVBORw0KGgoAAAANSUhEUgAA..."
    }
  ]
}

OpenAI Images Error Format

This interface returns OpenAI-style errors:

{
  "error": {
    "message": "Invalid value for 'model'",
    "type": "invalid_request_error",
    "param": "model",
    "code": "invalid_model"
  }
}