Technical Overview

From a single HTML file
to a full project.

Paste or upload any HTML file. Uncluster parses it, understands its structure, and converts it into whatever you need — clean JSX, a full TypeScript project, an EJS server setup, or just extracted and separated assets. No configuration. No local tooling required.

Features

Six things it does with your HTML

Every feature runs on the same server endpoint model — send HTML in, get a transformed result back.

POST /api/format

Format HTML

Takes messy, minified, or inconsistently indented HTML and outputs it with clean, normalized indentation. Tags are properly nested, whitespace is trimmed, and the structure becomes easy to read at a glance.

→ returns formatted HTML string

POST /api/convert

Convert to JSX

Rewrites the HTML as a valid React JSX component. Attribute names are translated (class→className, for→htmlFor, all event handlers), inline styles are converted to JS objects, void elements are self-closed, and structural wrapper tags are stripped out.

→ returns ready-to-paste .jsx / .tsx component

POST /api/analyze

Analyze Components

Scans the DOM for elements that repeat with the same structure and class names. Patterns that appear three or more times and match known UI identifiers (card, button, badge, modal, etc.) are surfaced as component suggestions, each with a generated prop list and starter JSX body.

→ returns JSON array of component suggestions

POST /api/export

Export ZIP

Pulls all inline <style> and <script> blocks out of the HTML into separate files, then downloads any external CDN stylesheets and scripts referenced via <link> and <script src> tags. The HTML is rewritten to reference these local files, then everything is bundled into a ZIP.

→ returns .zip with separated HTML, CSS, JS files

POST /api/export-nodejs

Export TSX Project

Scaffolds a complete, runnable Express + Vite + TypeScript project around your HTML. Generates package.json, vite.config.js, tsconfig.json, ESLint and Prettier configs, a server.js, and a structured src/ directory with your converted component inside it.

→ returns .zip, unzip and run npm install

POST /api/export-nodejs-ejs

Export EJS Project

Scaffolds an Express + EJS server-rendered project. Your HTML is split into EJS partials — one for the header, one for the footer, and one per major content block. Each partial is wired up to an Express route that calls res.render(). No client-side framework needed.

→ returns .zip ready for server-side rendering

The Pipeline

What happens under the hood

Every request runs through the same five-stage pipeline. The stages are the same regardless of which feature you use — only the final output stage changes.

01

Parse the HTML into a tree

The raw HTML string is fed into Go's html.Parse(). This produces a linked node tree — not an array, not a flat list. Each node has a type (element, text, comment), a tag name, a list of attributes, and pointers to its first child and next sibling. Every subsequent stage reads this tree; nothing works directly on the raw string.

Input
<div class="card">
  <h2>Title</h2>
  <p>Body text here</p>
</div>
Parsed tree
Element: div
  attr: class="card"
  └─ Element: h2
       └─ Text: "Title"
  └─ Element: p
       └─ Text: "Body text here"
02

Walk every node depth-first

Every stage visits the tree the same way: go to FirstChild, do the work, then move to NextSibling, recurse. This depth-first walk visits every node in the document exactly once. The converter uses this walk to build JSX. The analyzer uses it to count repeating patterns. The formatter uses it to track indent depth.

Diagram — depth-first traversal order

graph LR
    A["html.Parse()"] --> B["DOM root"]
    B --> C["visit FirstChild"]
    C --> D{"has children?"}
    D -->|yes| E["recurse into children"]
    D -->|no| F["move to NextSibling"]
    E --> F
    F --> G{"sibling exists?"}
    G -->|yes| C
    G -->|no| H["return up"]

    style A fill:#1565c0,color:#ffffff,stroke:#0d47a1
    style B fill:#1b5e20,color:#ffffff,stroke:#1b5e20
    style E fill:#e65100,color:#ffffff,stroke:#bf360c
            
03

Translate attributes (JSX path)

For JSX conversion, each element's attributes are run through a lookup table of 70+ entries. class becomes className. for becomes htmlFor. Every onclick-style handler becomes onClick. Inline style="color:red" strings become style={{ color: 'red' }} objects. Elements like <html>, <head>, and <body> are skipped entirely — they don't belong in a React component.

HTML
<label for="email"
  class="field"
  onclick="go()"
  style="color:red">
  Email
</label>
JSX
<label htmlFor="email"
  className="field"
  onClick={go}
  style={{ color: 'red' }}>
  Email
</label>
04

Detect repeating patterns (Analyze path)

For component analysis, each element visited during the walk is fingerprinted by its tag name, CSS classes, and id — for example div.card#featured. Elements that share the same fingerprint are grouped and counted. Any pattern that appears three or more times and whose name matches a known UI keyword (card, button, badge, modal, dialog, avatar, toast, alert...) is flagged as a component candidate. Generic structural tags like div and section are excluded even if they repeat.

Why the frequency threshold? A one-off element doesn't justify a component. Three or more occurrences is the signal that the developer already intended repetition — the component just hasn't been extracted yet.
05

Assemble and return the output

All string output is built with strings.Builder — append-only, no string concatenation, no intermediate copies. For JSX: imports at the top, the component function body in the middle, any extracted script logic at the bottom. For exports: the ZIP is assembled in memory using Go's archive/zip and streamed directly back in the HTTP response.

No temp files. Nothing is written to disk. Every operation runs entirely in memory and the result is returned directly in the HTTP response.

Internals

How asset extraction works

The Export ZIP and project scaffold features need to separate your HTML from its styles and scripts. Here's what the extractor actually does.

A

Inline styles and scripts → separate files

Every <style> block in the document is lifted out and written to its own numbered CSS file (style-0.css, style-1.css, ...). Every <script> block without a src attribute is lifted into a separate JS file. The original tags are then removed from the HTML.

B

External CDN links → downloaded locally

Any <link rel="stylesheet"> or <script src="..."> pointing to an external URL is fetched over HTTP and saved locally. CDN hostnames are replaced with short aliases in the filename, unsafe characters are stripped, and names are truncated to stay filesystem-safe. The HTML's href and src attributes are rewritten to point to the local file.

Example: A link to Bootstrap from cdn.jsdelivr.net/npm/bootstrap@5/dist/css/bootstrap.min.css becomes jsdelivr-bootstrap-min.css in the ZIP.

Project Generation

How a full project gets built from HTML

The scaffold exporters don't just rename files. They generate a complete project that you can unzip, run npm install, and start developing in immediately.

TSX

Express + Vite + TypeScript

Your HTML is converted to JSX and placed in src/App.tsx. The scaffolder generates all the boilerplate around it using Go's text/template engine — inserting your project name, dependency versions, and converted component at the right places. The output is a ready-to-run SPA with hot module replacement via Vite and an Express server for API routes.

Generated files include: package.json · vite.config.js · server.js · tsconfig.json · .eslintrc.json · .prettierrc · .gitignore · src/ with your component
EJS

Express + EJS server rendering

The HTML is parsed a second time and split into named EJS partial files — one for the header, one for the footer, and one for each major content section found in the body. Each partial is placed in views/partials/ and included from a root views/index.ejs. Express route handlers call res.render() on each view. No client-side framework is involved — pages are rendered server-side on every request.

Codebase

Project structure

Each internal package owns exactly one concern. No package imports another — they communicate only through function arguments and return values.

uncluster/
├── main.go // Fiber server, routes, middleware
├── cmd/uncluster-split/
│ └── main.go // CLI: run extraction without the server
└── internal/
├── analyzer/ // pattern counting, component suggestions
├── converter/ // HTML → JSX, 70+ attribute translations
├── extractor/ // splits inline CSS/JS, rewrites links
├── fetcher/ // downloads external CDN assets over HTTP
├── formatter/ // recursive indentation and normalization
├── nodejs/ // TSX + EJS project scaffolding, zip output
└── zipper/ // generic ZIP creation for extracted assets
Go 1.21 Fiber v2 golang.org/x/net/html text/template archive/zip

API Reference

Endpoints

All endpoints accept JSON with an html field. Export endpoints return a binary ZIP. The server includes CORS, request logging, and panic recovery middleware.

Method Path Request Response
POST /api/format { html: string } Formatted HTML string
POST /api/convert { html: string } JSX component string
POST /api/analyze { html: string } JSON array of component suggestions
POST /api/export { html: string } ZIP — separated HTML, CSS, JS
POST /api/export-nodejs { html: string } ZIP — full Express + Vite + TS project
POST /api/export-nodejs-ejs { html: string } ZIP — full Express + EJS project
GET /api/health { status, service, version }
CLI alternative: The cmd/uncluster-split binary runs the full extraction pipeline locally without starting the HTTP server. go run ./cmd/uncluster-split -input file.html -output ./out. Pass -manifest true to also write a split-manifest.json listing every output file produced.