Native and compact structured latents

TRELLIS.2 turns images into high-fidelity 3D assets at scale.

An open-source 4B-parameter image-to-3D model for generating up to 1536 cubed PBR textured assets, powered by native 3D VAEs, O-Voxel representation, and 16x spatial compression.

O-Voxel SC-VAE Native 3D VAE PBR Materials Arbitrary Topology Vanilla DiTs

Overview

A compact research stack for scalable, high-resolution 3D generation.

01

High quality, resolution, and efficiency

TRELLIS.2 generates fully textured assets with high fidelity and efficient generation, supporting 512 cubed, 1024 cubed, and 1536 cubed output regimes.

02

Native structured latents

The model uses native and compact structured latents to preserve geometry and appearance while keeping the representation small enough for large-scale generative modeling.

03

Minimal asset processing

Training and inference conversion are designed to be rendering-free and optimization-free, making the path between textured meshes and model-ready data more direct.

Key features

Built for complex 3D assets, not just closed surfaces.

TRELLIS.2 handles open surfaces, non-manifold geometry, and enclosed interior structures. It also supports rich material attributes including base color, roughness, metallic, and opacity for physically based rendering and photorealistic relighting.

512 cubed 3s

Total shape and material generation time reported on NVIDIA H100.

1024 cubed 17s

Higher-resolution textured asset generation with compact structured latents.

1536 cubed 60s

Large PBR asset output while preserving shape and material detail.

Tech innovations

From textured mesh to O-Voxel to sparse compressed latent space.

01

Instant bidirectional conversion

Meshes are transformed into O-Voxel, a field-free sparse voxel structure that encodes precise geometry and complex appearance together.

02

Omni-Voxel representation

Geometry uses flexible dual grids for arbitrary topology and sharp edges, while appearance stores PBR attributes for realistic material behavior.

03

Sparse Compression VAE

SC-VAE directly compresses voxel data with sparse residual autoencoding, reaching 16x downsampling and roughly 9.6K latent tokens for 1024 cubed assets.

04

Efficient mesh conversion

Textured mesh to O-Voxel conversion can run in under 10 seconds on a single CPU, while O-Voxel back to textured mesh can complete in under 100ms with CUDA acceleration.

Research snapshot

Why native and compact structured latents matter

TRELLIS.2

Generation target

High-resolution image-to-3D generation for fully textured assets that can preserve both geometry and material detail.

Representation

O-Voxel unifies geometry and appearance in a sparse structure, avoiding the constraints of iso-surface fields.

Compression strategy

SC-VAE makes the latent space compact enough for scalable modeling while maintaining negligible perceptual degradation.

Responsible use

The project is presented for academic and research exploration of 3D generation technologies, with responsible AI considerations included in the research process.

FAQ

The fastest answers for understanding TRELLIS.2.

What is TRELLIS.2?

TRELLIS.2 is an open-source 4B-parameter image-to-3D model focused on native and compact structured latents for high-fidelity 3D generation.

What does O-Voxel encode?

O-Voxel encodes geometry and appearance together, including shape structure and PBR material attributes such as base color, metallic, roughness, and alpha.

Why is 16x compression important?

Spatial compression reduces the latent representation so large, textured 3D assets can be modeled efficiently without giving up meaningful perceptual quality.

What kinds of topology does it handle?

The method is designed for arbitrary topology, including open surfaces, non-manifold geometry, and interior structures.