Manual Test Suite (Browser Mode + Live API)
These checks validate the real Chrome automation path and the optional live Responses API smoke suite. Run the browser steps whenever you touch Chrome automation (lifecycle, cookie sync, prompt injection, Markdown capture, etc.), and run the live API suite before shipping major transport changes.
#Prerequisites
- macOS with Chrome installed (default profile signed in to ChatGPT Pro).
- Node 24+ and
pnpm installalready completed. - Headful display access (no
--browser-headless). - When debugging, add
--browser-keep-browserso Chrome stays open after Oracle exits, then connect withpnpm exec tsx scripts/browser-tools.ts ...(screenshot, eval, DOM picker, etc.). - Ensure no Chrome instances are force-terminated mid-run; let Oracle clean up once you’re done capturing state.
- Clipboard checks (
browser-tools.ts eval "navigator.clipboard.readText()") trigger a permission dialog in Chrome—approve it for debugging, but remember that we can’t rely on readText in unattended runs.
#Test Cases
#Quick browser port smoke
pnpm test:browser— launches headful Chrome and checks the DevTools endpoint is reachable. SetORACLE_BROWSER_PORT(orORACLE_BROWSER_DEBUG_PORT) to reuse a fixed port when you’ve already opened a firewall rule.
#Gemini browser mode (Gemini web / cookies)
Run this whenever you touch the Gemini web client or the --generate-image / --edit-image plumbing.
Prereqs:
- Chrome profile is signed into
gemini.google.com.
- Generate an image:
pnpm run oracle -- --engine browser --model gemini-3-pro --prompt "a cute robot holding a banana" --generate-image /tmp/gemini-gen.jpg --aspect 1:1 --wait --verbose
- Confirm the output file exists and is a real image (
file /tmp/gemini-gen.jpg).
- Edit an image:
pnpm run oracle -- --engine browser --model gemini-3-pro --prompt "add sunglasses" --edit-image /tmp/gemini-gen.jpg --output /tmp/gemini-edit.jpg --wait --verbose
- Confirm
/tmp/gemini-edit.jpgexists.
#Multi-Model CLI fan-out
Run this whenever you touch the session store, CLI session views, or TUI wiring for multi-model runs.
- Kick off an API multi-run:
pnpm run oracle -- --models "gpt-5.1-pro,gemini-3-pro" --prompt "Compare the moon & sun."
- Expect stdout to print sequential sections, one per model (
[gpt-5.1-pro] …followed by[gemini-3-pro] …). No interleaved tokens.
- Capture the session ID from the summary line. Run
oracle session --status --model gpt-5.1-pro.
- Table should collapse to sessions that include GPT-5.1 Pro and show status icons (✓/⌛/✖) per model.
- Inspect detailed logs:
oracle session <id>
- The metadata header now includes a
Models:block with one line per model plus token counts. - When prompted, pick
View gemini-3-pro logand confirm only that model’s stream renders. Refresh should keep completed models intact even if others still run.
- Model filter path:
oracle session <id> --model gemini-3-pro
- Attach mode should error if that model is missing (double-check by filtering for a bogus model), otherwise it should render the prompt + single-model log only.
#Write-output export (API)
Run this when touching session serialization, file IO helpers, or CLI flag plumbing.
ORACLE_LIVE_TEST=1 OPENAI_API_KEY=<real key> pnpm vitest run tests/live/write-output-live.test.ts --runInBand
- Expect the test to create a temp
write-output-live.mdfile containingwrite-output e2e.
- Manual spot-check:
oracle --prompt "answer file smoke" --write-output /tmp/out.md --wait
- Confirm
/tmp/out.mdexists with the answer text and a trailing newline.
- Multi-model spot-check:
oracle --models "gpt-5.1-pro,gemini-3-pro" --prompt "two files" --write-output /tmp/out.md --wait
- Confirm
/tmp/out.gpt-5.1-pro.mdand/tmp/out.gemini-3-pro.mdexist with distinct content.
#Lightweight Browser CLI (manual exploration)
Before running any agent-driven debugging, you can rely on the TypeScript CLI in scripts/browser-tools.ts:
# Show help / available commands
pnpm tsx scripts/browser-tools.ts --help
# Launch Chrome with your normal profile so you stay logged in
pnpm tsx scripts/browser-tools.ts start --profile
# Drive the active tab
pnpm tsx scripts/browser-tools.ts nav https://example.com
pnpm tsx scripts/browser-tools.ts eval 'document.title'
pnpm tsx scripts/browser-tools.ts screenshot
pnpm tsx scripts/browser-tools.ts pick "Select checkout button"
pnpm tsx scripts/browser-tools.ts cookies
pnpm tsx scripts/browser-tools.ts inspect # show DevTools-enabled Chrome PIDs/ports/tabs
pnpm tsx scripts/browser-tools.ts kill --all --force # tear down straggler DevTools sessions
This mirrors Mario Zechner’s “What if you don’t need MCP?” technique and is handy when you just need a few quick interactions without spinning up additional tooling.
Debug note: when you have a live ChatGPT tab open under a DevTools port and need a quick DOM dump of the last assistant turn, run pnpm tsx scripts/debug/extract-chatgpt-response.ts <port>.
- Prompt Submission & Model Switching
- With Chrome signed in and cookie sync enabled, run
- Observe logs for:
Prompt textarea ready (xxx chars queued)(twice: initial + after model switch).Model picker: ... 5.2 ....Clicked send button(or Enter fallback).- In the attached Chrome window, verify the multi-line prompt appears exactly as sent.
``bash pnpm run oracle -- --engine browser --model "GPT-5.2" \ --prompt "Line 1\nLine 2\nLine 3" ``
- Markdown Capture
- Prompt:
- Expected CLI output:
Answer:section containing bullet list with Markdown preserved (e.g.,- item, fenced code).- Session log (
oracle session <id>) should show the assistant markdown (confirm viagrep -n '`' ~/.oracle/sessions/<id>/output.log).
``bash pnpm run oracle -- --engine browser --model "GPT-5.2" \ --prompt "Produce a short bullet list with code fencing." ``
- Stop Button Handling
- Start a long prompt (
"Write a detailed essay about browsers") and once ChatGPT responds, manually click “Stop generating” inside Chrome. - Oracle should detect the assistant message (partial) and still store the markdown.
- Override Flag
- Run with
--browser-allow-cookie-errorswhile intentionally breaking bindings. - Confirm log shows
Cookie sync failed (continuing with override)and the run proceeds headless/logged-out. - Remember: the browser composer now pastes only the user prompt (plus any inline file blocks). If you see the default “You are Oracle…” text or other system-prefixed content in the ChatGPT composer, something regressed in
assembleBrowserPromptand you should stop and file a bug. - Heartbeats: Browser runs emit
--heartbeatstatus while waiting. Long Thinking/Pro runs should show[browser] ChatGPT thinking ...or[browser] Waiting for ChatGPT response ...; the log must not include reasoning text from the side panel.
#Post-Run Validation
oracle session <id>should replay the transcript with markdown.~/.oracle/sessions/<id>/meta.jsonmust includebrowser.configmetadata (model label, cookie settings) andbrowser.runtime(PID/port).
Document results (pass/fail, session IDs) in PR descriptions so reviewers can audit real-world behavior.
#Recent Smoke Runs
- 2025-11-18 — API gpt-5.1 (
api-smoke-give-two-words): returned “blue sky” in 2.5s. - 2025-11-18 — API gpt-5.1-pro (
api-smoke-pro-three-words): completed in 3m08s with “Fast API verification”. - 2025-11-18 — Browser gpt-5.1 Instant (
browser-smoke-instant-two-words): completed in ~10s; replied with a clarification prompt. - 2025-11-18 — Browser gpt-5.1-pro (
browser-smoke-pro-three-words): completed in ~1m33s; response noted “Search tool used.”. - 2025-11-18 (rerun) — API gpt-5.1 (
api-smoke-give-two-words): reconfirmed OK; same answer + cost bracket. - 2025-11-18 (rerun) — Browser gpt-5.1-pro (
browser-smoke-pro-three-words): reconfirmed OK; included heartbeat progress and search tool note. - 2025-11-20 — Browser gpt-5.1 via
oracle serve(remote host on same Mac): fetched https://example.com; title “Example Domain”; first sentence “This domain is for use in documentation examples without needing permission.” (ran via tmux sessionsoracle-serveandoracle-client).
#Browser Regression Checklist (manual)
Run these four smoke tests whenever we touch browser automation:
- GPT-5.2 simple prompt
pnpm run oracle -- --engine browser --model "GPT-5.2" --prompt "Give me two short markdown bullet points about tables" Expect two markdown bullets, no files/search referenced. Note the session ID (e.g., give-me-two-short-markdown).
- GPT-5.2 simple prompt
pnpm run oracle -- --engine browser --model gpt-5.2 --prompt "List two reasons Markdown is handy" Confirm the answer arrives (and only once) even if it takes ~2–3 minutes.
- GPT-5.2 + attachment
Prepare /tmp/browser-md.txt with a short note, then run pnpm run oracle -- --engine browser --model "GPT-5.2" --prompt "Summarize the key idea from the attached note" --file /tmp/browser-md.txt Ensure upload logs show “Attachment queued” and the answer references the file contents explicitly.
- GPT-5.2 + attachment (verbose)
Prepare /tmp/browser-report.txt with faux metrics, then run pnpm run oracle -- --engine browser --model gpt-5.2 --prompt "Use the attachment to report current CPU and memory figures" --file /tmp/browser-report.txt --verbose Verify verbose logs show attachment upload and the final answer matches the file data.
- Deep Research smoke
pnpm run oracle -- --engine browser --browser-manual-login --browser-research deep --prompt "Research one current public source about WebGPU browser support and cite it" Confirm the logs show Deep Research activation/progress and the final report includes citations or source links. Do not use connected apps or private data.
- Multi-turn browser consult smoke
pnpm run oracle -- --engine browser --browser-manual-login --model gpt-5.5-pro --browser-thinking-time extended --prompt "Give one architectural recommendation for a tiny CLI cache." --browser-follow-up "Challenge your previous recommendation with one concrete failure mode." --browser-follow-up "Now return the final recommendation in one sentence, starting with CHECK_MULTI_TURN_OK." Confirm the output contains all captured turns, includes CHECK_MULTI_TURN_OK, and the saved transcript.md records both follow-up prompts.
- Multi-turn value check
Run the same initial prompt once without follow-ups and once with the challenge/final-decision follow-ups above. In the PR notes, record concrete differences such as extra failure modes, sharper rollback steps, or test cases. Do not claim a fixed quality percentage.
- Auto-archive smoke
pnpm run oracle -- --engine browser --browser-manual-login --model gpt-5.5-pro --browser-thinking-time extended --browser-archive always --prompt "Reply exactly CHECK_ARCHIVE_OK." Confirm the output contains CHECK_ARCHIVE_OK, oracle session <id> --render still shows the transcript, and ChatGPT shows the conversation under archived chats rather than the active sidebar. Also confirm a default --browser-archive auto run with Deep Research or follow-ups is not archived.
Record session IDs and outcomes in the PR description (pass/fail, notable delays). This ensures reviewers can audit real runs.
#Remote Chrome smoke test (CDP)
Run this whenever you touch CDP connection logic (remote chrome lifecycle, attachment transfer) or before executing remote sessions in CI.
- Launch a throwaway Chrome instance with remote debugging enabled (adjust the path per OS):
- Run the helper to verify CDP connectivity:
- Tear down the temporary browser:
``bash REMOTE_PROFILE=/tmp/oracle-remote-test-profile rm -rf "$REMOTE_PROFILE" "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \ --headless=new \ --disable-gpu \ --remote-debugging-port=9333 \ --remote-allow-origins=* \ --user-data-dir="$REMOTE_PROFILE" \ >/tmp/oracle-remote-chrome.log 2>&1 & export REMOTE_CHROME_PID=$! sleep 3 ``
``bash pnpm tsx scripts/test-remote-chrome.ts localhost 9333 `` Expect ✓ logs for connection, protocol info, navigation to https://chatgpt.com/, and the final “POC successful!” line.
``bash kill "$REMOTE_CHROME_PID" rm -rf "$REMOTE_PROFILE" ` Use pkill -f oracle-remote-test-profile` if Chrome refuses to exit cleanly.
Capture the pass/fail result (include the helper’s log snippet) in your PR description alongside other manual browser tests.
#Attach-running smoke test
Run this whenever you touch the local attach path (--browser-attach-running) or the direct browser websocket bootstrap.
- Start or reuse a local signed-in Chrome with DevTools access available. If you want an explicit local endpoint, launch Chrome with
--remote-debugging-port=9222. - Run Oracle against the running browser:
- Verify Oracle opens a fresh tab in the existing browser, returns the answer, and closes only that Oracle-owned tab afterward.
- Reattach sanity check: repeat with a very short timeout if needed, then run
oracle session <id>and confirm Oracle can reconnect to the saved tab/conversation.
``bash pnpm run oracle -- --engine browser \ --browser-attach-running \ --model "GPT-5.2" \ --prompt "Give me two short markdown bullets about browser tabs" ` If the browser’s remote-debugging UI shows a different local port, rerun with --remote-chrome <host:port> in addition to --browser-attach-running`.
#Chrome DevTools / MCP Debugging
Use this when you need to inspect the live ChatGPT composer (DOM state, markdown text, screenshots, etc.). For smaller ad‑hoc pokes, you can often rely on pnpm tsx scripts/browser-tools.ts … instead.
- Launch within tmux
``bash tmux new -d -s oracle-browser \\ "pnpm run oracle -- --engine browser --browser-keep-browser \\ --model 'GPT-5.5 Pro' --prompt 'Debug via DevTools.'" ``
Keeping the run in tmux prevents your shell from blocking and ensures Chrome stays open afterward.
- Grab the DevTools port
tmux capture-pane -pt oracle-browserto read the logs (Launched Chrome … on port 56663).- Verify the endpoint:
``bash curl http://127.0.0.1:<PORT>/json/version ` Note the webSocketDebuggerUrl` for reference.
- Attach Chrome DevTools MCP
- One-off:
CHROME_DEVTOOLS_URL=http://127.0.0.1:<PORT> npx -y chrome-devtools-mcp@latest mcporterconfig snippet:- Once the server prints
chrome-devtools-mcp exposes…, you can list/call tools viamcporter. - Oracle’s attach-running mode no longer depends on MCP at runtime;
mcporterremains useful here for manual inspection only.
``json { "chrome-devtools": { "command": "npx", "args": ["-y", "chrome-devtools-mcp@latest", "--browserUrl", "http://127.0.0.1:<PORT>"] } } ``
- Interact & capture
- Use MCP tools (
click,evaluate_js,screenshot, etc.) to debug the composer contents. - Record any manual actions you take (e.g., “fired evaluate_js to dump #prompt-textarea.innerText”).
- Cleanup
tmux kill-session -t oracle-browserpkill -f oracle-browser-<slug>if Chrome is still running.
Tip: Running
npx chrome-devtools-mcp@latest --helplists additional switches (custom Chrome binary, headless, viewport, etc.).
#Responses API Live Smoke Tests
These Vitest cases hit the real OpenAI API to exercise both transports:
- Export a real key and explicitly opt in (default runs stay fast):
- The first two tests target the standard GPT-5 (
gpt-5.1/gpt-5.2) foreground - Watch the console for
Reconnected to OpenAI background response...if
``bash export OPENAI_API_KEY=sk-... export ORACLE_LIVE_TEST=1 pnpm vitest run tests/live/openai-live.test.ts ``
streaming paths. The later background tests send gpt-5.5-pro and gpt-5.2-pro prompts and expect the CLI to stay in background mode until OpenAI finishes (up to 30 minutes).
you're debugging transport flakiness; the test will fail if the response status isn't completed or if the text doesn't contain the hard-coded smoke strings.
Skip these unless you're intentionally validating the production API; they are fully gated behind ORACLE_LIVE_TEST=1 to avoid accidental CI runs.