TEXTv1.2 Specification

Pure Text Based Document Definition

This document defines the formal rules for the TEXT layer of the Text2Doc workflow. It guarantees deterministic parsing and strict separation between narrative content, semantic structure, and visual rendering.

AuthorGabor Soos
Versionv2026.04.11
Languageen
Statusdraft
Comments

0. Core Principle

TEXT contains only human-readable content and structural cues derived from natural formatting.

TEXT is syntax-free but structured.

Renderer is intelligent but non-invasive.

This means:

TEXTv1.2 preserves v1.0 guarantees and clarifies paragraph line-breaks, section separation, and renderer baseline behavior.

1. Encoding & File Rules

1.1 File Extension

TEXT source files use the following extension:

The file extension does not affect parsing semantics, but it is the standard file naming convention for TEXTv1.2 and later.

2. Cover Block

The Cover Block defines the document’s identity.

It appears at the beginning of the file and ends immediately before the first numbered section heading.

A numbered section heading matches:

^(\d+(?:\.\d+)*\.?)\s+(.+)$

2.1 Title Paragraph

The first paragraph in the file is the Title Paragraph.

Rules:

If more than 2 lines are present → validation error.

2.2 Description Paragraph (Optional)

The paragraph following the Title Paragraph may be a Description Paragraph.

Rules:

2.3 Cover Meta / Comment Lines

After Title (+ optional Description), any remaining lines before the first numbered section are parsed as Cover Meta / Comment Lines.

2.3.1 Indentation Rule (Cover-only Exception)

Inside the Cover Block only:

This is the only global exception to the non-semantic indentation rule. List-local indentation is defined separately in section 3.3.2.

2.3.2 Metadata Format

After removing leading spaces, a line matching:

Key: Value

is treated as metadata.

Allowed keys (fixed vocabulary v1.2):

Rules:

2.3.3 Comment Lines

If a stripped indented line does not match Key: Value, it is treated as a comment.

Comments:

2.4 Mandatory Cover Fields

The Cover Block must contain:

If missing → validation error.

2.5 Version Format

Version must match:

vYYYY.MM.DD

Optionally followed by:

-XXX

Examples:

2.6 Language Format

Language (if present) should be a short lowercase code:

Examples:

Future versions may support extended locale formats.

3. Structural Rules

3.1 Sections

Sections are defined strictly by numbered headings.

Format examples:

0. Core Principle

1. Encoding & File Rules

2. Cover Block

2.1 Title Paragraph

2.1.1 Subsubtitle

Regex:

^(\d+(?:\.\d+)*\.?)\s+(.+)$

Rules:

The parser preserves the section number exactly as written in the source text.

Examples:

Section separation rules:

3.2 Paragraphs

Definition:

A paragraph is a block of consecutive non-empty lines separated by blank lines.

Rules:

Line-break behavior:

Practical interpretation:

Renderer baseline behavior (without SIDECAR):

If a paragraph contains multiple lines:
The first line is rendered as visually emphasized. Remaining lines are rendered as continuation lines

If a paragraph contains a single line: It is rendered as normal text

This is a renderer heuristic, not a TEXT syntax rule.

3.3 Lists

3.3.1 Flat Lists

Bullet list:

Rules:

Numbered list:

  1. Step
  2. Step

Rules:

3.3.2 Nested Bullet Lists

TEXTv1.2 keeps nested bullet lists.

A nested bullet list item is still introduced only by:

-

The nesting level is defined only by leading spaces.

Fixed indentation rule:

Rules:

Examples:

Valid:

Valid:

3.3.3 Nested List Scope

A nested bullet list is a single list structure.

Rules:

3.3.4 AST Recommendation

A parser should preserve nested bullet structure explicitly.

Recommended shape:

Each ListItem may contain zero or more child ListItem entries through a nested child list.

3.3.5 Renderer Baseline Behavior

Ordered list:

Bullet list:

This behavior reflects structural hierarchy, not explicit syntax.

3.4 Indentation

3.5 Page Breaks

No explicit page breaks in TEXT. Pagination is handled by the renderer based on section level.

3.6 URLs

URLs may appear as plain text inside paragraphs or list items.

Rules:

Examples:

4. Explicitly Forbidden

5. What TEXT Does NOT Contain

TEXT does not define:

Admonition content may exist in TEXT as normal paragraphs and lists. The semantic reclassification of those blocks into NOTE / WARNING / IMPORTANT belongs to SIDECAR.

6. What TEXT Does Guarantee

If rules are followed, the parser can deterministically produce:

7. Philosophy

  1. TEXT = narrative flow
  2. SIDECAR = semantic structure
  3. THEME = visual design

Admonition bodies should remain in TEXT whenever possible. SIDECAR should identify and reclassify existing TEXT blocks rather than duplicating long user-visible content.

No layer leaks into another.

TEXT remains syntax-free but structurally strict.