If you spend any time in the ComfyUI groups, you've seen the question over and over: how do I get Midjourney-quality results in ComfyUI? In one such Facebook thread this past week, the most upvoted answer wasn't a prompt trick. It was a redirect: "Use Qwen, Flux, Zimage instead. The license primarily applies to image generation for third parties, not to the images themselves." That single comment captures a shift we've watched gather pace all year. Architects who used to default to a Midjourney subscription are quietly moving to open-weight models they run locally, and they're doing it for reasons that have nothing to do with chasing a prettier render.

Over on r/comfyui, an architecture student asking how to turn sketches into realistic images gets the same chorus: build a local ControlNet workflow, pick an open base model, stop paying per month. And on LinkedIn, the "AI for Architect | ComfyUI Specialist" crowd has been posting Qwen-driven archviz for a while now. So this isn't fringe. It's a working pattern. Let's break down why, and then get honest about where Midjourney still wins.


Why architects are going local: licensing, cost, control, privacy

Four reasons keep coming up, and they stack. Any one of them might not move you off a tool you know. Together they're why the open stack has gone from hobbyist curiosity to something studios actually run on deadline work.

Licensing clarity. This is the one that started the thread. Midjourney's commercial-use terms are tied to your subscription tier and have moved around over the years; the practical anxiety for a firm is always "am I allowed to put this in a fee proposal?" Open-weight models reframe the question. As the commenter noted, with these models the license concern is largely about reselling image generation as a service to third parties, not about the images you generate for your own projects. That's a meaningfully cleaner story for a practice. The important caveat: the exact terms differ by model and version, and Flux in particular ships in multiple license tiers, so you still have to read the specific license that comes with the weights you download. Don't take a forum comment as legal advice. But the direction is real.

Cost. A subscription is a recurring line item. A local model is a one-time download against hardware you may already own. If you're rendering at volume, ten studies a day across a project, the math tilts hard toward local. The cost moves from per-seat-per-month to electricity and the GPU you bought anyway.

Control. This is the technical heart of it. In ComfyUI you wire the pipeline yourself: base model, ControlNet for structure, IPAdapter for style reference, upscalers, the lot. You can lock a render to the exact massing of your model export. Midjourney gives you a prompt box and, increasingly, reference tools, but nothing approaching node-level control over geometry.

Privacy. Client work under NDA shouldn't be uploaded to a third-party service if you can avoid it. A local model never sends the image off your machine. For confidential competition entries or unannounced developments, that alone can be the deciding factor.

The pitch isn't "open models look better." It's "open models are yours" — your license, your machine, your geometry, your client's confidentiality intact.

The three contenders, honestly compared

Flux, Qwen-Image and Z-Image are three different model families with three different temperaments. Here's how the community generally characterizes each for architectural work. A note before the specifics: these models iterate fast and exact version strings and benchmark numbers move week to week, so we're describing reputations and tendencies, not quoting spec sheets. Check the current model card before you commit a pipeline.

Flux (Black Forest Labs)
★ Photoreal exteriors
Pricing: Free / open weights · License varies by variant (dev vs commercial tiers) · Runs locally in ComfyUI

The community's default for photoreal architectural exteriors. Flux is widely regarded as having strong prompt adherence and clean, believable light, which is exactly what archviz needs. The catch is licensing nuance: Black Forest Labs releases Flux in multiple tiers, including a non-commercial dev license, so confirm you're using a variant whose terms fit billable work.

PhotorealExteriorsPrompt adherenceCheck license tier
Qwen-Image (Alibaba)
★ Text & editing
Pricing: Free / open weights · Typically permissive open license · Larger / more VRAM-hungry

The model the LinkedIn archviz specialists keep reaching for. Qwen-Image is generally noted for unusually good text rendering and diagram handling — signage, labelled plans, callouts that don't dissolve into gibberish — plus capable image editing. That makes it strong for presentation boards and annotated visuals, not just pretty hero shots. It's a heavier model, so expect higher VRAM demand than a quantized Flux build.

Text renderingEditingBoardsHeavier model
Z-Image (Tongyi / Alibaba)
★ Efficiency
Pricing: Free / open weights · Newer release · Positioned as efficient

The newest of the three and the one we'd hedge on hardest. Z-Image, from Alibaba's Tongyi lab, is generally positioned by the community as an efficiency play — competitive quality at lower compute, attractive if your GPU is modest. It has less of a track record in archviz than Flux, so treat it as promising and worth testing rather than proven. Watch the threads before you rebuild your pipeline around it.

EfficientNewerLower computeLess proven for archviz

When to reach for which

Your task Reach for Why
Photoreal exterior hero shot Flux Strong photorealism and prompt adherence; the community default for buildings
Presentation board with text / labels Qwen-Image Better at legible text, signage and diagram elements
Editing or restyling an existing render Qwen-Image Capable image-editing workflows beyond pure generation
Modest GPU, fast iteration Z-Image Positioned as the efficient option; test before committing
Out-of-the-box polish, minimal setup Midjourney Still wins when you want a beautiful image from one prompt

The workflow that makes it work: sketch to render via ControlNet

None of this matters for architecture if the model invents a different building than the one you designed. The thing that makes local diffusion genuinely useful for us, and the reason that r/comfyui student was pointed at it, is structural conditioning. In ComfyUI you don't just prompt a base model; you pair it with a ControlNet (or equivalent) that constrains the output to a structure you provide.

The practical chain looks like this:

That depth- or edge-locked pass is the capability Midjourney simply doesn't offer at the same level. It's the difference between "give me a building like this" and "render this building." For design development, where the client already approved the massing, that distinction is the whole game.

The hardware reality

Here's where honesty matters most, because the forums are full of optimistic VRAM claims. The community generally reports that full-size Flux is comfortable around 24GB of VRAM, with quantized GGUF and FP8 builds dropping it to 12GB or even 8GB cards at some cost to speed and quality. Qwen-Image is larger and hungrier. Z-Image is pitched as lighter. But treat every one of those numbers as a starting point, not a guarantee, they shift with each quantization and each ComfyUI update. If you're on a laptop GPU, start with a quantized build and modest resolutions, and upscale after. If you're specced with a 24GB desktop card, the door is wide open.


Our take: open wins on everything except the first impression

We'll say the unglamorous thing plainly: Midjourney still wins on out-of-the-box aesthetic polish. Type a prompt, get a gorgeous image, no node graph, no driver headaches, no quantization roulette. For early mood-boarding and concept exploration, that frictionlessness is a real feature, and pretending otherwise to sell a local workflow would be dishonest.

But for the work that actually gets billed, the calculus flips. Open models in ComfyUI win on the things that matter to a practice over a project's life: licensing clarity, recurring cost, geometric control, and client privacy. The ControlNet sketch-to-render pipeline does something Midjourney structurally can't, hold your design while it renders it. And the license story, the one that started the thread, removes a quiet anxiety that's been hanging over commercial AI image use since the beginning.

Our recommendation for a studio testing the water: start with Flux for exteriors, add Qwen-Image when you need legible text or to edit a render, and keep an eye on Z-Image as the efficiency option matures. Don't rip out Midjourney on day one, use it where its polish earns its keep, and move your structured, confidential, high-volume work to the local stack. That's not a religious choice between camps. It's a practitioner picking the right tool per task.

If you set this up this week

Install ComfyUI, pull a quantized Flux build that fits your card, and wire one ControlNet from a depth pass of a project you've already approved. Render it once. Then read the license file that came with your weights, actually read it, before you put the output anywhere near a fee proposal. That single end-to-end test will tell you more than any forum thread about whether this belongs in your pipeline.

We build and test local AI rendering pipelines on real project work and publish the honest version, including where they fall short. Join the studio newsletter for weekly field notes, see our advanced ComfyUI tutorial for the node-level workflow, or read our roundup of Midjourney alternatives for architects for the wider landscape.


Reported from a ComfyUI community thread on achieving Midjourney-quality results, an r/comfyui sketch-to-render discussion among architecture students, and LinkedIn posts from ComfyUI archviz specialists. Model characterizations reflect general community consensus, not benchmarked specs; version strings, VRAM figures and license terms change frequently — confirm against the current model card. No affiliate relationship with Black Forest Labs, Alibaba, Tongyi, ComfyUI or Midjourney.