The Problem
Claude Code is excellent at software engineering. But Gemini 3 Pro (released November 18, 2025) has state-of-the-art multimodal capabilities: it can analyze screenshots, generate UI from sketches, and see what Claude cannot.
Why not use both?
The Solution
A /gemini slash command that spawns Gemini CLI from Claude Code. Same pattern as my Codex post: spawn a specialized model, get results back, apply them.
Workflow patterns:
Quick fixes (screenshot → applied):
- Copy screenshot to clipboard
- Run
/gemini fix the spacing issues in this screenshot - Gemini analyzes and returns code
- Changes applied directly (slash command detects “fix” intent)
Deep analysis (like redesigning a landing page):
- Pass full page code to Gemini
- Gemini analyzes for modernization, target market fit, visual hierarchy
- Returns comprehensive restructured components
- Claude applies and you iterate with more screenshots
Research/architecture (no visual input):
- Run
/gemini compare microservices vs modular monolith for our scale - Gemini returns structured analysis with trade-offs
- Use the insights to inform your implementation
- Claude Code: Backend, logic, refactoring, debugging
- Gemini: Visual analysis, UI work, research, deep analysis, creative solutions
- Codex: Architecture decisions, systems thinking, strategic planning
Gemini and Codex overlap on research/analysis. The difference: Gemini has multimodal capabilities and a 1M token context window. Codex tends toward more deliberate, senior-engineer style thinking.
Setup
1. Install Gemini CLI
npm install -g @google/gemini-cli
Set your API key:
export GEMINI_API_KEY=your_key_here
For Gemini 3 Pro, you need a paid API key or Google AI Ultra subscription. Run /settings in Gemini CLI and enable “Preview features”.
2. Install pngpaste (for clipboard screenshots)
brew install pngpaste
3. Create the wrapper script
~/.claude/bin/gemini-clean
#!/bin/bash
output=$(gemini "$@" 2>&1)
echo "$output" | jq -r '.response' 2>/dev/null || echo "$output"
Make it executable:
chmod +x ~/.claude/bin/gemini-clean
4. Create the system prompt
~/.claude/gemini-prompt.md
# Gemini 3 Pro Agent
You are a senior engineer with strengths in visual understanding,
UI/UX design, multimodal analysis, and creative problem-solving.
## Core Strengths
1. **Visual/Multimodal**: Analyze screenshots, mockups, diagrams, and visual artifacts
2. **UI/UX Generation**: Convert sketches to code, generate components, improve visual design
3. **Creative Solutions**: Think outside the box, generate novel approaches
4. **Large Context**: Handle large codebases and documents with 1M token context
5. **Research & Analysis**: Deep dive into topics, compare approaches
## Approach
### For Visual/UI Tasks:
- Analyze screenshots for issues (alignment, hierarchy, accessibility)
- Generate complete, runnable HTML/CSS/JS
- Suggest specific improvements with code
- Focus on polish and user experience
### For Analysis/Research Tasks:
- Break down the problem systematically
- Consider multiple angles and trade-offs
- Provide actionable recommendations
- Be thorough but concise
### For Code Tasks:
- Write clean, production-ready code
- Follow existing patterns in the codebase
- Include brief explanations of key decisions
## Output Format
- **When generating code**: Provide complete, runnable code with brief explanation
- **When analyzing**: Structure as Problem → Analysis → Recommendations
- **When reviewing**: Be specific and actionable, not vague
## What You're NOT
- Not verbose - get to the point
- Not a yes-machine - challenge bad ideas
- Not Claude - you're Gemini, different perspective and strengths
## Context
You are invoked from Claude Code to provide specialized assistance.
Return actionable output that can be immediately used.
5. Create the slash command
~/.claude/commands/gemini.md
---
allowed-tools:
- Bash(gemini:*)
- Bash(~/.claude/bin/gemini-clean:*)
- Bash(rm /tmp/gemini-*:*)
- Bash(pngpaste:*)
- Glob
- Grep
- Edit
description: Launch Gemini 3 Pro for visual analysis, UI/UX, or research
---
Invoke Gemini 3 Pro for visual/multimodal tasks.
## Steps:
1. **Check for visual input:**
- If user mentions screenshot/image/clipboard:
```bash
TIMESTAMP=$(date +%s) && pngpaste /tmp/gemini-${TIMESTAMP}.png
```
2. **Gather context** from relevant project files
3. **Execute:**
```bash
~/.claude/bin/gemini-clean \
--model gemini-3-pro-preview \
-p "$(cat ~/.claude/gemini-prompt.md)
PROJECT: [project name]
CONTEXT: [relevant context]
[If image:]
IMAGE: @/tmp/gemini-xxx.png
USER_REQUEST: [request]" \
--output-format json
```
4. **Determine action:**
- Apply directly if: "fix", "change", "update", "make"
- Return analysis if: "review", "analyze", "suggest"
5. **Apply or return results**
6. **Cleanup temp files**
USER REQUEST: $*
Usage
Visual analysis with clipboard:
# Copy screenshot, then:
/gemini fix the alignment issues in this screenshot
UI generation:
/gemini create a pricing table component matching our design system
Architecture/research:
/gemini analyze the trade-offs between SSR and SSG for this project
/gemini compare event sourcing vs CRUD for our audit requirements
Deep analysis:
/gemini review the error handling patterns across this codebase
/gemini what are the security implications of this auth flow
How It Works
The key is Gemini CLI’s @ syntax for images:
gemini -p "Analyze this: @/tmp/screenshot.png" --output-format json
The pngpaste command grabs images from your clipboard:
pngpaste /tmp/screenshot.png
Combined with Claude Code’s slash commands, you get a seamless workflow where tasks are routed to the right model automatically.
— The hybrid approachUse Claude for what Claude does best. Use Gemini for what Gemini does best. The slash command handles the routing.
Lessons Learned
-
Gemini 3 Pro’s visual understanding is genuinely good: It rebuilt an entire landing page and measurably improved the UX, design, and aesthetic in one shot. It also caught spacing issues and suggested specific CSS fixes from screenshots alone.
-
The non-MCP approach is simpler: Just spawn the CLI and pipe back results. No server configuration needed.
-
Clipboard integration is essential: Without
pngpaste, you’d have to manually save screenshots to files. The extra friction kills the workflow. -
Bias toward action: The slash command applies fixes directly when the intent is clear. Asking for approval on every change slows you down.
Try It
- Install the dependencies (Gemini CLI, pngpaste)
- Create the three files above
- Copy a screenshot and run
/gemini fix the visual issues in this screenshot
You could implement this as a skill that auto-activates when you mention visual keywords or edit UI files. But slash commands give explicit control over when to spawn an external model - which makes sense for this workflow. See my post on the skills controllability problem for when each pattern fits.
Different models, different strengths, one workflow.


