Hybrid AI Workflows: Spawning Gemini from Claude Code

The Problem

Claude Code is excellent at software engineering. But Gemini 3 Pro (released November 18, 2025) has state-of-the-art multimodal capabilities: it can analyze screenshots, generate UI from sketches, and see what Claude cannot.

Why not use both?

The Solution

A /gemini slash command that spawns Gemini CLI from Claude Code. Same pattern as my Codex post: spawn a specialized model, get results back, apply them.

Workflow patterns:

Quick fixes (screenshot → applied):

Copy screenshot to clipboard
Run /gemini fix the spacing issues in this screenshot
Gemini analyzes and returns code
Changes applied directly (slash command detects “fix” intent)

Deep analysis (like redesigning a landing page):

Pass full page code to Gemini
Gemini analyzes for modernization, target market fit, visual hierarchy
Returns comprehensive restructured components
Claude applies and you iterate with more screenshots

Research/architecture (no visual input):

Run /gemini compare microservices vs modular monolith for our scale
Gemini returns structured analysis with trade-offs
Use the insights to inform your implementation

When to use which

Claude Code: Backend, logic, refactoring, debugging
Gemini: Visual analysis, UI work, research, deep analysis, creative solutions
Codex: Architecture decisions, systems thinking, strategic planning

Gemini and Codex overlap on research/analysis. The difference: Gemini has multimodal capabilities and a 1M token context window. Codex tends toward more deliberate, senior-engineer style thinking.

Setup

1. Install Gemini CLI

npm install -g @google/gemini-cli

Set your API key:

export GEMINI_API_KEY=your_key_here

For Gemini 3 Pro, you need a paid API key or Google AI Ultra subscription. Run /settings in Gemini CLI and enable “Preview features”.

2. Install pngpaste (for clipboard screenshots)

brew install pngpaste

3. Create the wrapper script

~/.claude/bin/gemini-clean

#!/bin/bash
output=$(gemini "$@" 2>&1)
echo "$output" | jq -r '.response' 2>/dev/null || echo "$output"

Make it executable:

chmod +x ~/.claude/bin/gemini-clean

4. Create the system prompt

~/.claude/gemini-prompt.md

# Gemini 3 Pro Agent

You are a senior engineer with strengths in visual understanding,
UI/UX design, multimodal analysis, and creative problem-solving.

## Core Strengths
1. **Visual/Multimodal**: Analyze screenshots, mockups, diagrams, and visual artifacts
2. **UI/UX Generation**: Convert sketches to code, generate components, improve visual design
3. **Creative Solutions**: Think outside the box, generate novel approaches
4. **Large Context**: Handle large codebases and documents with 1M token context
5. **Research & Analysis**: Deep dive into topics, compare approaches

## Approach

### For Visual/UI Tasks:
- Analyze screenshots for issues (alignment, hierarchy, accessibility)
- Generate complete, runnable HTML/CSS/JS
- Suggest specific improvements with code
- Focus on polish and user experience

### For Analysis/Research Tasks:
- Break down the problem systematically
- Consider multiple angles and trade-offs
- Provide actionable recommendations
- Be thorough but concise

### For Code Tasks:
- Write clean, production-ready code
- Follow existing patterns in the codebase
- Include brief explanations of key decisions

## Output Format
- **When generating code**: Provide complete, runnable code with brief explanation
- **When analyzing**: Structure as Problem → Analysis → Recommendations
- **When reviewing**: Be specific and actionable, not vague

## What You're NOT
- Not verbose - get to the point
- Not a yes-machine - challenge bad ideas
- Not Claude - you're Gemini, different perspective and strengths

## Context
You are invoked from Claude Code to provide specialized assistance.
Return actionable output that can be immediately used.

5. Create the slash command

~/.claude/commands/gemini.md

---
allowed-tools:
  - Bash(gemini:*)
  - Bash(~/.claude/bin/gemini-clean:*)
  - Bash(rm /tmp/gemini-*:*)
  - Bash(pngpaste:*)
  - Glob
  - Grep
  - Edit
description: Launch Gemini 3 Pro for visual analysis, UI/UX, or research
---

Invoke Gemini 3 Pro for visual/multimodal tasks.

## Steps:

1. **Check for visual input:**
   - If user mentions screenshot/image/clipboard:
     ```bash
     TIMESTAMP=$(date +%s) && pngpaste /tmp/gemini-${TIMESTAMP}.png
     ```

2. **Gather context** from relevant project files

3. **Execute:**
   ```bash
   ~/.claude/bin/gemini-clean \
     --model gemini-3-pro-preview \
     -p "$(cat ~/.claude/gemini-prompt.md)

   PROJECT: [project name]
   CONTEXT: [relevant context]

   [If image:]
   IMAGE: @/tmp/gemini-xxx.png

   USER_REQUEST: [request]" \
     --output-format json
   ```

4. **Determine action:**
   - Apply directly if: "fix", "change", "update", "make"
   - Return analysis if: "review", "analyze", "suggest"

5. **Apply or return results**

6. **Cleanup temp files**

USER REQUEST: $*

Usage

Visual analysis with clipboard:

# Copy screenshot, then:
/gemini fix the alignment issues in this screenshot

UI generation:

/gemini create a pricing table component matching our design system

Architecture/research:

/gemini analyze the trade-offs between SSR and SSG for this project
/gemini compare event sourcing vs CRUD for our audit requirements

Deep analysis:

/gemini review the error handling patterns across this codebase
/gemini what are the security implications of this auth flow

How It Works

The key is Gemini CLI’s @ syntax for images:

gemini -p "Analyze this: @/tmp/screenshot.png" --output-format json

The pngpaste command grabs images from your clipboard:

pngpaste /tmp/screenshot.png

Combined with Claude Code’s slash commands, you get a seamless workflow where tasks are routed to the right model automatically.

Use Claude for what Claude does best. Use Gemini for what Gemini does best. The slash command handles the routing.

— The hybrid approach

Lessons Learned

Gemini 3 Pro’s visual understanding is genuinely good: It rebuilt an entire landing page and measurably improved the UX, design, and aesthetic in one shot. It also caught spacing issues and suggested specific CSS fixes from screenshots alone.
The non-MCP approach is simpler: Just spawn the CLI and pipe back results. No server configuration needed.
Clipboard integration is essential: Without pngpaste, you’d have to manually save screenshots to files. The extra friction kills the workflow.
Bias toward action: The slash command applies fixes directly when the intent is clear. Asking for approval on every change slows you down.

Try It

Install the dependencies (Gemini CLI, pngpaste)
Create the three files above
Copy a screenshot and run /gemini fix the visual issues in this screenshot

Skills vs slash commands

You could implement this as a skill that auto-activates when you mention visual keywords or edit UI files. But slash commands give explicit control over when to spawn an external model - which makes sense for this workflow. See my post on the skills controllability problem for when each pattern fits.

Different models, different strengths, one workflow.