Nano Banana Pro + Midjourney + ChatGPT, The Architect's Hybrid Render Workflow Explained, ArchiGen AI

Epic cinematic architectural visualization of a futuristic skyscraper with organic green terraces — Futuristic organic skyscraper render / Midjourney V8 / ArchiGen AI Original Render

If you've been on architecture YouTube in the last three weeks, you've seen a version of this workflow. The video titles vary, "the best AI architecture render workflow," "the ultimate concept design pipeline," "Midjourney + Nano Banana Pro is changing architecture", but the structure underneath them is consistent. Three tools, three roles, one loop.

I was skeptical when the first video crossed my feed. Stacking three different AI tools usually means three different sets of artifacts, three different prompt languages, and a workflow that looks neat on a tutorial and breaks in practice. So we ran it on real billable work, a boutique hotel concept package, four exterior moods and three interior scenes, due to client end of last week. Here's what actually held up.

What the triangle is

Three tools, three jobs:

Midjourney, the ideation pass. You're not trying to produce final renders here. You're trying to generate atmosphere, mood, material palette, and unexpected directions you wouldn't have thought of unprompted. Loose, exploratory, fast.
ChatGPT, the translation layer. Midjourney produces something promising but verbally vague. You feed the Midjourney output (and your client brief) to ChatGPT and ask it to write a structured prompt that captures what's working, material specifics, lighting language, camera direction, architectural vocabulary. ChatGPT compresses inspiration into a deployable prompt.
Nano Banana Pro, the geometric finish. Google's NBP is genuinely good at preserving spatial logic and producing client-deliverable image quality. You feed it your modeled geometry (or an upgraded line drawing) and the ChatGPT-structured prompt, and you get a render that respects both the architecture and the inspiration.

The triangle structure exists because none of the three tools is good at all three jobs. Midjourney is wildly creative and architecturally illiterate. NBP is architecturally faithful but creatively conservative. ChatGPT writes structured language better than humans typing prompts under deadline. Each tool does what it's best at.

A macro architectural detail shot through fluted glass showing raking daylight over concrete and brass. — Generated · Gemini Translating Midjourney’s fever dreams into buildable sections takes real patience.

The workflow, step by step

Step 01 / Ideate

Midjourney as inspiration engine

For our hotel concept I ran six Midjourney passes against language pulled directly from the client brief, "warm, material-forward, references mid-century Mexican modernism, Pacific coast light, lobby reads as a courtyard." I prompted for atmosphere, not floor plans. Style references included specific architects (Barragán, Legorreta) and specific photographers (Iwan Baan for daylight quality).

Step 02 / Curate

Select the moves that survive the brief

From 24 Midjourney variations I kept four. Selection criteria: does the lighting read correctly for our site latitude? Does the material palette translate to construction we'd actually detail? Is there one specific move (a courtyard cut, a material adjacency, a window-to-wall ratio) worth carrying forward? If a Midjourney pass is beautiful but architecturally absurd, it's gone.

Step 03 / Translate

ChatGPT writes the deployable prompt

I uploaded one Midjourney reference image to ChatGPT (GPT-5-class model with vision) along with the project brief and asked: "Describe this image as a structured prompt for an AI render tool that respects architectural geometry. Specify materials, lighting language, camera position, and atmosphere. Use the language of architectural rendering, not marketing." What came back was usable in about 70% of cases without editing.

Step 04 / Render

Nano Banana Pro on actual geometry

For each scene I built a quick Rhino massing model (no detail, just the architectural moves we cared about), exported a view as a line drawing reference, and fed both the line drawing and the ChatGPT prompt into NBP. NBP held the geometry tightly while picking up the material and atmospheric direction from the prompt. Output: client-presentable concept rendering with architectural intent intact.

Step 05 / Refine

Targeted iteration, not regeneration

For each NBP output we ran 2–3 targeted refinement passes, adjust the lighting time of day, swap one material, modify the entourage. We didn't regenerate from scratch. The seed-and-prompt loop in NBP made these refinements deterministic enough that we knew what we were going to get.

The whole loop, per scene, ran about 45 minutes including curation. Seven scenes for the hotel package: just over five hours of total render-and-refine time. Comparable static V-Ray output would have been roughly three full days.

Where the workflow actually works

The hotel concept package was a credible test case because it had two things that triangle-workflow tutorials usually skip: a real brief from a real client, and a real architectural geometry pass that constrained the AI rendering layer. When you watch the YouTube videos, you'll notice most demos are pure prompt-to-image without geometry. That works for portfolio shots; it doesn't survive a client meeting where someone asks "what does the south elevation actually look like."

Three things this workflow does meaningfully better than the alternatives we've tested:

It expands the search space

Most AI render workflows for architecture start with the architect's existing geometry and stylize it. The triangle workflow starts with Midjourney's wild ideation, then translates back to architecture. We discovered two moves in the concept package, a stepped courtyard section and a specific material adjacency (cast-in-place concrete next to weathered cedar), that we wouldn't have proposed unprompted. Both survived to the client meeting and were enthusiastically received.

It separates inspiration from execution cleanly

The cleanest analogy is mood boarding versus rendering. Midjourney is the mood board you make in 30 minutes. NBP is the render you make once the design has settled. Old workflows blurred these stages, you'd start prompting a render with vague atmospheric language and try to reverse-engineer architecture out of the result. The triangle keeps them clean: ideate first, render second, never the two at once.

The prompt engineering compounds

The ChatGPT-written prompts have a structure we noticed becomes reusable across projects. After three projects through this loop we have a prompt library, "Pacific coast afternoon," "Mediterranean material palette," "Brooklyn industrial loft retrofit," each one a six-paragraph structured prompt that ChatGPT helped us write from a reference image. These are now studio assets. We pull them into new projects with the geometry swapped.

A top-down drone photograph of an illuminated hotel swimming pool and stone terrace at night with swimming guests. — Generated · Gemini Where the pipeline fails: getting three models to agree on hardware.

Where it doesn't work

Three real failure modes worth flagging:

It's bad for tightly-constrained design problems

If a client comes in with a very specific image in their head, they want "exactly this" and brings five reference photos, the triangle workflow's strength (expansion via Midjourney) becomes a liability. Skip Midjourney entirely, go straight to ChatGPT for prompt structuring against the references, and feed that to NBP. Forcing exploration when the design is locked is a waste of an hour.

It's not the right tool for technical deliverables

This is a concept workflow. It produces beautiful moody renderings appropriate for concept packages, marketing materials, and competition entries. It does not produce technically accurate construction visualizations. For DD and CD phase work where you need precise material spec, accurate window mullion geometry, and reliable shadow studies, you're going back to Veras, Rendair, or V-Ray. The triangle wins at convincing, not at specifying.

The prompt translation step still requires editing

ChatGPT writes prompts that are 70% usable, not 100% usable. Common failures: it overstuffs prompts with adjectives (NBP responds badly to seventeen adjectives in a row), it hallucinates non-existent architectural styles, and it occasionally inserts photography terms that confuse rather than help. Expect to spend 5 minutes per prompt cleaning these up. Not nothing.

The triangle vs the single-tool workflow

Stage	Triangle workflow	Single-tool (Veras / Rendair only)
Ideation / inspiration	Midjourney, wide creative search	Limited to the studio's existing references
Prompt construction	ChatGPT structures reusable prompts	Manual, ad-hoc per project
Architectural geometry preservation	NBP respects line drawings	Native to the tool
Material specification accuracy	Moderate, interpretation needed	Tight, material library is yours
Time per scene (concept)	~45 min including curation	~90 min, re-prompting cycles
Time per scene (DD/CD-grade)	Not the right tool	Veras / Rendair / V-Ray
Best use case	Concept packages, competitions, marketing	Production stills, DD/CD visualizations

What we'd change

After three projects through this loop, the modification we've made to the YouTube-tutorial version: drop a Rhino or SketchUp geometry pass into the middle. The tutorials skip this step because it's slow on camera. In practice it's the difference between a portfolio render and a client deliverable. We model a 30-minute massing pass between Midjourney curation and NBP rendering. The model is rough, no detail, no precise dimensions, but it constrains NBP to architectural geometry that respects the brief.

Without that step, NBP produces beautiful images that don't actually represent the building we're proposing. With it, NBP produces beautiful images of the building we're proposing, lit and finished in the direction the triangle workflow surfaced.

Should you adopt this

If your practice does any concept package or competition work, yes. The triangle workflow has earned a permanent place in our studio's pre-DD pipeline. The combined cost (Midjourney Pro at $60/mo, ChatGPT Plus at $20/mo, Nano Banana Pro at usage-based pricing roughly $50/mo for our usage) is significantly less than a single junior staff hour reclaimed per scene.

If your practice is BIM-first, DD/CD-focused, and produces only technical visualizations: skip it. You are not the audience. Stay on Veras or Rendair, keep your material library tight, and don't introduce three additional tools and prompt-writing labor into a workflow that's already working.

If you're a student or sole practitioner: this is the workflow you've been waiting for. Concept-quality renderings, three accessible tools, total cost under $130/mo. The output ceiling is high enough for competition entries and portfolio work. Get fluent in this loop before you spend a dollar on Lumion or Enscape.

Want our actual ChatGPT prompts?

The six-prompt library Vista Studios reuses across concept projects, Mediterranean, Pacific coast, Brooklyn industrial, alpine, desert, and tropical material palettes.

Read the prompt engineering guide →

Tested by Vista Studios on a live boutique hotel concept package. No affiliate relationships with Midjourney, OpenAI, or Google. Geometry passes built in Rhino 8. All client information altered for privacy.

The triangle workflow.