Can I use images from Flux, Qwen-Image or Z-Image commercially on client projects?

Generally, yes, but the exact terms depend on the specific model and version, so always read the license that ships with the weights you download. Black Forest Labs releases Flux under several tiers, including a non-commercial dev license and separate commercial terms, so the variant matters. Alibaba's Qwen-Image and the Tongyi Z-Image releases have typically used permissive open licenses. The community signal that prompted this piece put it bluntly: with these open models the licensing concern is largely about offering image generation as a service to third parties, not about owning and using the images you generate yourself. Confirm the current terms before you bill a client.

What hardware and VRAM do I need to run these models in ComfyUI?

It varies by model and by the quantized build you use. The community generally reports that full-size Flux runs comfortably on cards with around 24GB of VRAM, while quantized GGUF and FP8 builds bring it down to 12GB or even 8GB cards at some speed and quality cost. Qwen-Image is a larger model and is more demanding, again with smaller quantized builds available. Z-Image is positioned by the community as a more efficient option. Treat any single VRAM number as a starting point and check the model card and ComfyUI release notes for your exact build.

Is local diffusion in ComfyUI better than Midjourney for architecture?

It depends on what you value. Midjourney still tends to win on out-of-the-box aesthetic polish: type a prompt, get a beautiful image. Local models in ComfyUI win on control, cost, licensing clarity and client privacy. With ControlNet you can lock a render to your actual massing, edges or depth from a sketch or model export, which Midjourney does not match. If you need a fast mood image, Midjourney is hard to beat. If you need a render that respects your geometry and never leaves your machine, the open stack wins.

How does the sketch-to-render workflow work with these models in ComfyUI?

The standard approach is to pair a base diffusion model with a ControlNet or similar structural conditioning. You feed a line drawing, a Canny or scribble edge map, or a depth pass exported from your 3D model, and the ControlNet constrains the generation to that structure while the base model handles materials, light and atmosphere. This is what lets an architecture student turn a hand sketch into a photoreal render without the model inventing a different building. The exact node setup differs per model family, so follow a ComfyUI workflow built for the specific model you are using.

Qwen, Flux and Z-Image: The License-Safe ComfyUI Stack Architects Use Instead of Midjourney, ArchiGen AI

Creative parametric facade detail render using Flux and Qwen AI models — Parametric facade detail rendering / Flux & Qwen ComfyUI / ArchiGen AI Curation

If you spend any time in the ComfyUI groups, you've seen the question over and over: how do I get Midjourney-quality results in ComfyUI? In one such Facebook thread this past week, the most upvoted answer wasn't a prompt trick. It was a redirect: "Use Qwen, Flux, Zimage instead. The license primarily applies to image generation for third parties, not to the images themselves." That single comment captures a shift we've watched gather pace all year. Architects who used to default to a Midjourney subscription are quietly moving to open-weight models they run locally, and they're doing it for reasons that have nothing to do with chasing a prettier render.

Over on r/comfyui, an architecture student asking how to turn sketches into realistic images gets the same chorus: build a local ControlNet workflow, pick an open base model, stop paying per month. And on LinkedIn, the "AI for Architect | ComfyUI Specialist" crowd has been posting Qwen-driven archviz for a while now. So this isn't fringe. It's a working pattern. Let's break down why, and then get honest about where Midjourney still wins.

Why architects are going local: licensing, cost, control, privacy

Four reasons keep coming up, and they stack. Any one of them might not move you off a tool you know. Together they're why the open stack has gone from hobbyist curiosity to something studios actually run on deadline work.

Licensing clarity. This is the one that started the thread. Midjourney's commercial-use terms are tied to your subscription tier and have moved around over the years; the practical anxiety for a firm is always "am I allowed to put this in a fee proposal?" Open-weight models reframe the question. As the commenter noted, with these models the license concern is largely about reselling image generation as a service to third parties, not about the images you generate for your own projects. That's a meaningfully cleaner story for a practice. The important caveat: the exact terms differ by model and version, and Flux in particular ships in multiple license tiers, so you still have to read the specific license that comes with the weights you download. Don't take a forum comment as legal advice. But the direction is real.

Cost. A subscription is a recurring line item. A local model is a one-time download against hardware you may already own. If you're rendering at volume, ten studies a day across a project, the math tilts hard toward local. The cost moves from per-seat-per-month to electricity and the GPU you bought anyway.

Control. This is the technical heart of it. In ComfyUI you wire the pipeline yourself: base model, ControlNet for structure, IPAdapter for style reference, upscalers, the lot. You can lock a render to the exact massing of your model export. Midjourney gives you a prompt box and, increasingly, reference tools, but nothing approaching node-level control over geometry.

Privacy. Client work under NDA shouldn't be uploaded to a third-party service if you can avoid it. A local model never sends the image off your machine. For confidential competition entries or unannounced developments, that alone can be the deciding factor.

The pitch isn't "open models look better." It's "open models are yours" — your license, your machine, your geometry, your client's confidentiality intact.

A night view through wet glass showing a projection-mapped facade in vibrant colors and architects inside a studio. — Generated · Gemini Nothing says local control quite like burning midnight oil in peace.

The three contenders, honestly compared

Flux, Qwen-Image and Z-Image are three different model families with three different temperaments. Here's how the community generally characterizes each for architectural work. A note before the specifics: these models iterate fast and exact version strings and benchmark numbers move week to week, so we're describing reputations and tendencies, not quoting spec sheets. Check the current model card before you commit a pipeline.

Flux (Black Forest Labs)

★ Photoreal exteriors

Pricing: Free / open weights · License varies by variant (dev vs commercial tiers) · Runs locally in ComfyUI

The community's default for photoreal architectural exteriors. Flux is widely regarded as having strong prompt adherence and clean, believable light, which is exactly what archviz needs. The catch is licensing nuance: Black Forest Labs releases Flux in multiple tiers, including a non-commercial dev license, so confirm you're using a variant whose terms fit billable work.

PhotorealExteriorsPrompt adherenceCheck license tier

Qwen-Image (Alibaba)

★ Text & editing

Pricing: Free / open weights · Typically permissive open license · Larger / more VRAM-hungry

The model the LinkedIn archviz specialists keep reaching for. Qwen-Image is generally noted for unusually good text rendering and diagram handling — signage, labelled plans, callouts that don't dissolve into gibberish — plus capable image editing. That makes it strong for presentation boards and annotated visuals, not just pretty hero shots. It's a heavier model, so expect higher VRAM demand than a quantized Flux build.

Text renderingEditingBoardsHeavier model

Z-Image (Tongyi / Alibaba)

★ Efficiency

Pricing: Free / open weights · Newer release · Positioned as efficient

The newest of the three and the one we'd hedge on hardest. Z-Image, from Alibaba's Tongyi lab, is generally positioned by the community as an efficiency play — competitive quality at lower compute, attractive if your GPU is modest. It has less of a track record in archviz than Flux, so treat it as promising and worth testing rather than proven. Watch the threads before you rebuild your pipeline around it.

EfficientNewerLower computeLess proven for archviz

When to reach for which

Your task	Reach for	Why
Photoreal exterior hero shot	Flux	Strong photorealism and prompt adherence; the community default for buildings
Presentation board with text / labels	Qwen-Image	Better at legible text, signage and diagram elements
Editing or restyling an existing render	Qwen-Image	Capable image-editing workflows beyond pure generation
Modest GPU, fast iteration	Z-Image	Positioned as the efficient option; test before committing
Out-of-the-box polish, minimal setup	Midjourney	Still wins when you want a beautiful image from one prompt

A close-up texture shot of a carved limestone and timber facade illuminated by low morning sunlight. — Generated · Gemini Precise edge detection hits differently when your license permits commercial use.

The workflow that makes it work: sketch to render via ControlNet

None of this matters for architecture if the model invents a different building than the one you designed. The thing that makes local diffusion genuinely useful for us, and the reason that r/comfyui student was pointed at it, is structural conditioning. In ComfyUI you don't just prompt a base model; you pair it with a ControlNet (or equivalent) that constrains the output to a structure you provide.

The practical chain looks like this:

Input: a hand sketch, a line export from SketchUp or Rhino, or a depth pass rendered from your 3D model.
Preprocess: convert it to an edge map (Canny or scribble) or feed the depth map directly.
Condition: a ControlNet locks generation to that structure so massing, openings and edges hold.
Generate: the base model (Flux, Qwen-Image, Z-Image) fills in materials, light and atmosphere within those bounds.
Refine: upscale, then optionally use Qwen-Image's editing to fix one element without re-rolling the whole frame.

That depth- or edge-locked pass is the capability Midjourney simply doesn't offer at the same level. It's the difference between "give me a building like this" and "render this building." For design development, where the client already approved the massing, that distinction is the whole game.

The hardware reality

Here's where honesty matters most, because the forums are full of optimistic VRAM claims. The community generally reports that full-size Flux is comfortable around 24GB of VRAM, with quantized GGUF and FP8 builds dropping it to 12GB or even 8GB cards at some cost to speed and quality. Qwen-Image is larger and hungrier. Z-Image is pitched as lighter. But treat every one of those numbers as a starting point, not a guarantee, they shift with each quantization and each ComfyUI update. If you're on a laptop GPU, start with a quantized build and modest resolutions, and upscale after. If you're specced with a 24GB desktop card, the door is wide open.

Our take: open wins on everything except the first impression

We'll say the unglamorous thing plainly: Midjourney still wins on out-of-the-box aesthetic polish. Type a prompt, get a gorgeous image, no node graph, no driver headaches, no quantization roulette. For early mood-boarding and concept exploration, that frictionlessness is a real feature, and pretending otherwise to sell a local workflow would be dishonest.

But for the work that actually gets billed, the calculus flips. Open models in ComfyUI win on the things that matter to a practice over a project's life: licensing clarity, recurring cost, geometric control, and client privacy. The ControlNet sketch-to-render pipeline does something Midjourney structurally can't, hold your design while it renders it. And the license story, the one that started the thread, removes a quiet anxiety that's been hanging over commercial AI image use since the beginning.

Our recommendation for a studio testing the water: start with Flux for exteriors, add Qwen-Image when you need legible text or to edit a render, and keep an eye on Z-Image as the efficiency option matures. Don't rip out Midjourney on day one, use it where its polish earns its keep, and move your structured, confidential, high-volume work to the local stack. That's not a religious choice between camps. It's a practitioner picking the right tool per task.

If you set this up this week

Install ComfyUI, pull a quantized Flux build that fits your card, and wire one ControlNet from a depth pass of a project you've already approved. Render it once. Then read the license file that came with your weights, actually read it, before you put the output anywhere near a fee proposal. That single end-to-end test will tell you more than any forum thread about whether this belongs in your pipeline.

We build and test local AI rendering pipelines on real project work and publish the honest version, including where they fall short. Join the studio newsletter for weekly field notes, see our advanced ComfyUI tutorial for the node-level workflow, or read our roundup of Midjourney alternatives for architects for the wider landscape.

Reported from a ComfyUI community thread on achieving Midjourney-quality results, an r/comfyui sketch-to-render discussion among architecture students, and LinkedIn posts from ComfyUI archviz specialists. Model characterizations reflect general community consensus, not benchmarked specs; version strings, VRAM figures and license terms change frequently — confirm against the current model card. No affiliate relationship with Black Forest Labs, Alibaba, Tongyi, ComfyUI or Midjourney.

The license-safe ComfyUI stack architects use instead of Midjourney