Manual Test Suite (Browser Mode + Live API)

These checks validate the real Chrome automation path and the optional live Responses API smoke suite. Run the browser steps whenever you touch Chrome automation (lifecycle, cookie sync, prompt injection, Markdown capture, etc.), and run the live API suite before shipping major transport changes.

#Prerequisites

macOS with Chrome installed (default profile signed in to ChatGPT Pro).
Node 24+ and pnpm install already completed.
Headful display access (no --browser-headless).
When debugging, add --browser-keep-browser so Chrome stays open after Oracle exits, then connect with pnpm exec tsx scripts/browser-tools.ts ... (screenshot, eval, DOM picker, etc.).
Ensure no Chrome instances are force-terminated mid-run; let Oracle clean up once you’re done capturing state.
Clipboard checks (browser-tools.ts eval "navigator.clipboard.readText()") trigger a permission dialog in Chrome—approve it for debugging, but remember that we can’t rely on readText in unattended runs.

#Test Cases

#Quick browser port smoke

pnpm test:browser — launches headful Chrome and checks the DevTools endpoint is reachable. Set ORACLE_BROWSER_PORT (or ORACLE_BROWSER_DEBUG_PORT) to reuse a fixed port when you’ve already opened a firewall rule.

#Gemini browser mode (Gemini web / cookies)

Run this whenever you touch the Gemini web client or the --generate-image / --edit-image plumbing.

Prereqs:

Chrome profile is signed into gemini.google.com.

Generate an image:

pnpm run oracle -- --engine browser --model gemini-3-pro --prompt "a cute robot holding a banana" --generate-image /tmp/gemini-gen.jpg --aspect 1:1 --wait --verbose

Confirm the output file exists and is a real image (file /tmp/gemini-gen.jpg).

Edit an image:

pnpm run oracle -- --engine browser --model gemini-3-pro --prompt "add sunglasses" --edit-image /tmp/gemini-gen.jpg --output /tmp/gemini-edit.jpg --wait --verbose

Confirm /tmp/gemini-edit.jpg exists.

#Multi-Model CLI fan-out

Run this whenever you touch the session store, CLI session views, or TUI wiring for multi-model runs.

Kick off an API multi-run:

pnpm run oracle -- --models "gpt-5.1-pro,gemini-3-pro" --prompt "Compare the moon & sun."

Expect stdout to print sequential sections, one per model ([gpt-5.1-pro] … followed by [gemini-3-pro] …). No interleaved tokens.

Capture the session ID from the summary line. Run oracle session --status --model gpt-5.1-pro.

Table should collapse to sessions that include GPT-5.1 Pro and show status icons (✓/⌛/✖) per model.

Inspect detailed logs: oracle session <id>

The metadata header now includes a Models: block with one line per model plus token counts.
When prompted, pick View gemini-3-pro log and confirm only that model’s stream renders. Refresh should keep completed models intact even if others still run.

Model filter path: oracle session <id> --model gemini-3-pro

Attach mode should error if that model is missing (double-check by filtering for a bogus model), otherwise it should render the prompt + single-model log only.

#Write-output export (API)

Run this when touching session serialization, file IO helpers, or CLI flag plumbing.

ORACLE_LIVE_TEST=1 OPENAI_API_KEY=<real key> pnpm vitest run tests/live/write-output-live.test.ts --runInBand

Expect the test to create a temp write-output-live.md file containing write-output e2e.

Manual spot-check: oracle --prompt "answer file smoke" --write-output /tmp/out.md --wait

Confirm /tmp/out.md exists with the answer text and a trailing newline.

Multi-model spot-check: oracle --models "gpt-5.1-pro,gemini-3-pro" --prompt "two files" --write-output /tmp/out.md --wait

Confirm /tmp/out.gpt-5.1-pro.md and /tmp/out.gemini-3-pro.md exist with distinct content.

#Lightweight Browser CLI (manual exploration)

Before running any agent-driven debugging, you can rely on the TypeScript CLI in scripts/browser-tools.ts:

# Show help / available commands
pnpm tsx scripts/browser-tools.ts --help

# Launch Chrome with your normal profile so you stay logged in
pnpm tsx scripts/browser-tools.ts start --profile

# Drive the active tab
pnpm tsx scripts/browser-tools.ts nav https://example.com
pnpm tsx scripts/browser-tools.ts eval 'document.title'
pnpm tsx scripts/browser-tools.ts screenshot
pnpm tsx scripts/browser-tools.ts pick "Select checkout button"
pnpm tsx scripts/browser-tools.ts cookies
pnpm tsx scripts/browser-tools.ts inspect   # show DevTools-enabled Chrome PIDs/ports/tabs
pnpm tsx scripts/browser-tools.ts kill --all --force   # tear down straggler DevTools sessions

This mirrors Mario Zechner’s “What if you don’t need MCP?” technique and is handy when you just need a few quick interactions without spinning up additional tooling.

Debug note: when you have a live ChatGPT tab open under a DevTools port and need a quick DOM dump of the last assistant turn, run pnpm tsx scripts/debug/extract-chatgpt-response.ts <port>.

Prompt Submission & Model Switching

With Chrome signed in and cookie sync enabled, run

``bash pnpm run oracle -- --engine browser --model "GPT-5.2" \ --prompt "Line 1\nLine 2\nLine 3" ``

Observe logs for:
Prompt textarea ready (xxx chars queued) (twice: initial + after model switch).
Model picker: ... 5.2 ....
Clicked send button (or Enter fallback).
In the attached Chrome window, verify the multi-line prompt appears exactly as sent.

Markdown Capture

Prompt:

``bash pnpm run oracle -- --engine browser --model "GPT-5.2" \ --prompt "Produce a short bullet list with code fencing." ``

Expected CLI output:
Answer: section containing bullet list with Markdown preserved (e.g., - item, fenced code).
Session log (oracle session <id>) should show the assistant markdown (confirm via grep -n '`' ~/.oracle/sessions/<id>/output.log).

Stop Button Handling

Start a long prompt ("Write a detailed essay about browsers") and once ChatGPT responds, manually click “Stop generating” inside Chrome.
Oracle should detect the assistant message (partial) and still store the markdown.

Override Flag

Run with --browser-allow-cookie-errors while intentionally breaking bindings.
Confirm log shows Cookie sync failed (continuing with override) and the run proceeds headless/logged-out.
Remember: the browser composer now pastes only the user prompt (plus any inline file blocks). If you see the default “You are Oracle…” text or other system-prefixed content in the ChatGPT composer, something regressed in assembleBrowserPrompt and you should stop and file a bug.
Heartbeats: Browser runs emit --heartbeat status while waiting. Long Thinking/Pro runs should show [browser] ChatGPT thinking ... or [browser] Waiting for ChatGPT response ...; the log must not include reasoning text from the side panel.

#Post-Run Validation

oracle session <id> should replay the transcript with markdown.
~/.oracle/sessions/<id>/meta.json must include browser.config metadata (model label, cookie settings) and browser.runtime (PID/port).

Document results (pass/fail, session IDs) in PR descriptions so reviewers can audit real-world behavior.

#Recent Smoke Runs

2025-11-18 — API gpt-5.1 (api-smoke-give-two-words): returned “blue sky” in 2.5s.
2025-11-18 — API gpt-5.1-pro (api-smoke-pro-three-words): completed in 3m08s with “Fast API verification”.
2025-11-18 — Browser gpt-5.1 Instant (browser-smoke-instant-two-words): completed in ~10s; replied with a clarification prompt.
2025-11-18 — Browser gpt-5.1-pro (browser-smoke-pro-three-words): completed in ~1m33s; response noted “Search tool used.”.
2025-11-18 (rerun) — API gpt-5.1 (api-smoke-give-two-words): reconfirmed OK; same answer + cost bracket.
2025-11-18 (rerun) — Browser gpt-5.1-pro (browser-smoke-pro-three-words): reconfirmed OK; included heartbeat progress and search tool note.
2025-11-20 — Browser gpt-5.1 via oracle serve (remote host on same Mac): fetched https://example.com; title “Example Domain”; first sentence “This domain is for use in documentation examples without needing permission.” (ran via tmux sessions oracle-serve and oracle-client).

#Browser Regression Checklist (manual)

Run these four smoke tests whenever we touch browser automation:

GPT-5.2 simple prompt

pnpm run oracle -- --engine browser --model "GPT-5.2" --prompt "Give me two short markdown bullet points about tables" Expect two markdown bullets, no files/search referenced. Note the session ID (e.g., give-me-two-short-markdown).

GPT-5.2 simple prompt

pnpm run oracle -- --engine browser --model gpt-5.2 --prompt "List two reasons Markdown is handy" Confirm the answer arrives (and only once) even if it takes ~2–3 minutes.

GPT-5.2 + attachment

Prepare /tmp/browser-md.txt with a short note, then run pnpm run oracle -- --engine browser --model "GPT-5.2" --prompt "Summarize the key idea from the attached note" --file /tmp/browser-md.txt Ensure upload logs show “Attachment queued” and the answer references the file contents explicitly.

GPT-5.2 + attachment (verbose)

Prepare /tmp/browser-report.txt with faux metrics, then run pnpm run oracle -- --engine browser --model gpt-5.2 --prompt "Use the attachment to report current CPU and memory figures" --file /tmp/browser-report.txt --verbose Verify verbose logs show attachment upload and the final answer matches the file data.

Deep Research smoke

pnpm run oracle -- --engine browser --browser-manual-login --browser-research deep --prompt "Research one current public source about WebGPU browser support and cite it" Confirm the logs show Deep Research activation/progress and the final report includes citations or source links. Do not use connected apps or private data.

Multi-turn browser consult smoke

pnpm run oracle -- --engine browser --browser-manual-login --model gpt-5.5-pro --browser-thinking-time extended --prompt "Give one architectural recommendation for a tiny CLI cache." --browser-follow-up "Challenge your previous recommendation with one concrete failure mode." --browser-follow-up "Now return the final recommendation in one sentence, starting with CHECK_MULTI_TURN_OK." Confirm the output contains all captured turns, includes CHECK_MULTI_TURN_OK, and the saved transcript.md records both follow-up prompts.

Multi-turn value check

Run the same initial prompt once without follow-ups and once with the challenge/final-decision follow-ups above. In the PR notes, record concrete differences such as extra failure modes, sharper rollback steps, or test cases. Do not claim a fixed quality percentage.

Auto-archive smoke

pnpm run oracle -- --engine browser --browser-manual-login --model gpt-5.5-pro --browser-thinking-time extended --browser-archive always --prompt "Reply exactly CHECK_ARCHIVE_OK." Confirm the output contains CHECK_ARCHIVE_OK, oracle session <id> --render still shows the transcript, and ChatGPT shows the conversation under archived chats rather than the active sidebar. Also confirm a default --browser-archive auto run with Deep Research or follow-ups is not archived.

Record session IDs and outcomes in the PR description (pass/fail, notable delays). This ensures reviewers can audit real runs.

#Remote Chrome smoke test (CDP)

Run this whenever you touch CDP connection logic (remote chrome lifecycle, attachment transfer) or before executing remote sessions in CI.

Launch a throwaway Chrome instance with remote debugging enabled (adjust the path per OS):

``bash REMOTE_PROFILE=/tmp/oracle-remote-test-profile rm -rf "$REMOTE_PROFILE" "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \ --headless=new \ --disable-gpu \ --remote-debugging-port=9333 \ --remote-allow-origins=* \ --user-data-dir="$REMOTE_PROFILE" \ >/tmp/oracle-remote-chrome.log 2>&1 & export REMOTE_CHROME_PID=$! sleep 3 ``

Run the helper to verify CDP connectivity:

``bash pnpm tsx scripts/test-remote-chrome.ts localhost 9333 `` Expect ✓ logs for connection, protocol info, navigation to https://chatgpt.com/, and the final “POC successful!” line.

Tear down the temporary browser:

``bash kill "$REMOTE_CHROME_PID" rm -rf "$REMOTE_PROFILE" ` Use pkill -f oracle-remote-test-profile` if Chrome refuses to exit cleanly.

Capture the pass/fail result (include the helper’s log snippet) in your PR description alongside other manual browser tests.

#Attach-running smoke test

Run this whenever you touch the local attach path (--browser-attach-running) or the direct browser websocket bootstrap.

Start or reuse a local signed-in Chrome with DevTools access available. If you want an explicit local endpoint, launch Chrome with --remote-debugging-port=9222.
Run Oracle against the running browser:

``bash pnpm run oracle -- --engine browser \ --browser-attach-running \ --model "GPT-5.2" \ --prompt "Give me two short markdown bullets about browser tabs" ` If the browser’s remote-debugging UI shows a different local port, rerun with --remote-chrome <host:port> in addition to --browser-attach-running`.

Verify Oracle opens a fresh tab in the existing browser, returns the answer, and closes only that Oracle-owned tab afterward.
Reattach sanity check: repeat with a very short timeout if needed, then run oracle session <id> and confirm Oracle can reconnect to the saved tab/conversation.

#Chrome DevTools / MCP Debugging

Use this when you need to inspect the live ChatGPT composer (DOM state, markdown text, screenshots, etc.). For smaller ad‑hoc pokes, you can often rely on pnpm tsx scripts/browser-tools.ts … instead.

Launch within tmux

``bash tmux new -d -s oracle-browser \\ "pnpm run oracle -- --engine browser --browser-keep-browser \\ --model 'GPT-5.5 Pro' --prompt 'Debug via DevTools.'" ``

Keeping the run in tmux prevents your shell from blocking and ensures Chrome stays open afterward.

Grab the DevTools port

tmux capture-pane -pt oracle-browser to read the logs (Launched Chrome … on port 56663).
Verify the endpoint:

``bash curl http://127.0.0.1:<PORT>/json/version ` Note the webSocketDebuggerUrl` for reference.

Attach Chrome DevTools MCP

One-off: CHROME_DEVTOOLS_URL=http://127.0.0.1:<PORT> npx -y chrome-devtools-mcp@latest
mcporter config snippet:

``json { "chrome-devtools": { "command": "npx", "args": ["-y", "chrome-devtools-mcp@latest", "--browserUrl", "http://127.0.0.1:<PORT>"] } } ``

Once the server prints chrome-devtools-mcp exposes…, you can list/call tools via mcporter.
Oracle’s attach-running mode no longer depends on MCP at runtime; mcporter remains useful here for manual inspection only.

Interact & capture

Use MCP tools (click, evaluate_js, screenshot, etc.) to debug the composer contents.
Record any manual actions you take (e.g., “fired evaluate_js to dump #prompt-textarea.innerText”).

Cleanup

tmux kill-session -t oracle-browser
pkill -f oracle-browser-<slug> if Chrome is still running.

Tip: Running npx chrome-devtools-mcp@latest --help lists additional switches (custom Chrome binary, headless, viewport, etc.).

#Responses API Live Smoke Tests

These Vitest cases hit the real OpenAI API to exercise both transports:

Export a real key and explicitly opt in (default runs stay fast):

``bash export OPENAI_API_KEY=sk-... export ORACLE_LIVE_TEST=1 pnpm vitest run tests/live/openai-live.test.ts ``

The first two tests target the standard GPT-5 (gpt-5.1 / gpt-5.2) foreground

streaming paths. The later background tests send gpt-5.5-pro and gpt-5.2-pro prompts and expect the CLI to stay in background mode until OpenAI finishes (up to 30 minutes).

Watch the console for Reconnected to OpenAI background response... if

you're debugging transport flakiness; the test will fail if the response status isn't completed or if the text doesn't contain the hard-coded smoke strings.

Skip these unless you're intentionally validating the production API; they are fully gated behind ORACLE_LIVE_TEST=1 to avoid accidental CI runs.