Photo Q uilt

Training-Free Arbitrary-Resolution Photomosaics via Bootstrapped Tiled Denoising

1 University of Toronto · 2 Vector Institute · 3 Samsung Research · 4 KITE Research Institute · 5 Queen's University

Equal contribution

The arXiv PDF uses compressed figures. Use Paper (HQ figures) for the full-resolution version.

PhotoQuilt photomosaic teaser — click to zoom

Abstract

Photomosaics are large images whose local regions are seen as independent tiles while their overall arrangement forms a coherent scene. Generating them at high resolution, with every tile convincing in its own right, is computationally expensive, since the canvas must hold many detailed tiles at once. We present PhotoQuilt, a training-free framework that generates photomosaics at arbitrary resolution. Diffusion models struggle to satisfy both scales at once, as direct high-resolution generation is costly and tends toward one smooth image rather than a mosaic, while patch-based tiling keeps local detail but loses global structure. PhotoQuilt resolves this with a bootstrapped tiled denoising procedure. We first produce a global composition at low resolution to fix the layout, then upscale it in latent space and re-inject noise to restore generative capacity. Denoising proceeds within fixed tiles, so each forms its own image while the shared global structure holds them in one layout. Because tile generation is handled separately, PhotoQuilt scales to large canvases without quadratic attention cost. Experiments show that PhotoQuilt outperforms current baselines on both global structure and local realism.

PhotoQuilt pipeline: low-resolution global layout, latent upscaling with noise re-injection, and tiled denoising
PhotoQuilt data flow: a low-resolution pass fixes global layout; the latent canvas is upscaled and re-noised to restore detail; tiled denoising then fills each region independently while sharing the same composition.
Toronto skyline photomosaic at 14K resolution — click to zoom
Ultra-high-resolution photomosaic of the Toronto skyline at 14,336×14,336 px with 256×256 px tiles. Each tile is generated independently with the prompt “A bird.” Click to zoom (twice for full detail); drag to pan when zoomed.
Lewis Hamilton photomosaic at 12K resolution — click to zoom
Photomosaic of Lewis Hamilton at 12,288×6,144 px. The global layout follows his 2026 Spanish Grand Prix victory photo; each tile is conditioned on a real photograph from one of his podium celebrations. Click to zoom (twice for full detail); drag to pan when zoomed.

Citation

BibTeX will be added when the paper is on arXiv.