TEXTv1.2 Specification
Pure Text Based Document Definition
This document defines the formal rules for the TEXT layer of the Text2Doc workflow. It guarantees deterministic parsing and strict separation between narrative content, semantic structure, and visual rendering.
0. Core Principle
TEXT contains only human-readable content and structural cues derived from natural formatting.
- No styling.
- No layout hints.
- No HTML.
- No Markdown.
- No semantic directives.
- Everything else lives in SIDECAR or renderer behavior.
TEXT is syntax-free but structured.
Renderer is intelligent but non-invasive.
This means:
- TEXT does not encode visual intent.
- Renderer may apply minimal emphasis based on structure.
- No artificial syntax is introduced.
TEXTv1.2 preserves v1.0 guarantees and clarifies paragraph line-breaks, section separation, and renderer baseline behavior.
1. Encoding & File Rules
- UTF-8 encoding only
- Unix line endings (\n)
- No tab characters (spaces only)
- No trailing whitespace
- No maximum line length constraint
1.1 File Extension
TEXT source files use the following extension:
- .t2d
The file extension does not affect parsing semantics, but it is the standard file naming convention for TEXTv1.2 and later.
2. Cover Block
The Cover Block defines the document’s identity.
It appears at the beginning of the file and ends immediately before the first numbered section heading.
A numbered section heading matches:
^(\d+(?:\.\d+)*\.?)\s+(.+)$
2.1 Title Paragraph
The first paragraph in the file is the Title Paragraph.
Rules:
- Line 1 = Title (mandatory)
- Line 2 = Subtitle (optional)
- Maximum 2 lines allowed
- Lines must not start with whitespace
- No additional lines allowed in this paragraph
If more than 2 lines are present → validation error.
2.2 Description Paragraph (Optional)
The paragraph following the Title Paragraph may be a Description Paragraph.
Rules:
- Optional
- Must follow normal paragraph rules
- Must not start with whitespace
- May contain multiple lines
- Ends at first blank line
2.3 Cover Meta / Comment Lines
After Title (+ optional Description), any remaining lines before the first numbered section are parsed as Cover Meta / Comment Lines.
2.3.1 Indentation Rule (Cover-only Exception)
Inside the Cover Block only:
- Any line starting with at least one leading space is interpreted as a Cover Meta / Comment Line.
- Outside the Cover Block, indentation remains non-semantic unless the line is part of a nested bullet list.
This is the only global exception to the non-semantic indentation rule. List-local indentation is defined separately in section 3.3.2.
2.3.2 Metadata Format
After removing leading spaces, a line matching:
Key: Value
is treated as metadata.
Allowed keys (fixed vocabulary v1.2):
- Author (mandatory)
- Version (mandatory)
- Language (optional)
- Comments (optional)
- Status (optional)
Rules:
- Keys are case-insensitive but normalized internally
- Keys must be unique (no duplicates)
- Unknown keys are forbidden in v1.2
- Metadata values are single-line only
2.3.3 Comment Lines
If a stripped indented line does not match Key: Value, it is treated as a comment.
Comments:
- Are ignored for semantic structure
- May optionally be preserved in AST
- Do not affect rendering
2.4 Mandatory Cover Fields
The Cover Block must contain:
- Title
- Author
- Version
If missing → validation error.
2.5 Version Format
Version must match:
vYYYY.MM.DD
Optionally followed by:
-XXX
Examples:
- v2026.02.20
- v2026.02.20-1230
2.6 Language Format
Language (if present) should be a short lowercase code:
Examples:
- en
- hu
Future versions may support extended locale formats.
3. Structural Rules
3.1 Sections
Sections are defined strictly by numbered headings.
Format examples:
0. Core Principle
1. Encoding & File Rules
2. Cover Block
2.1 Title Paragraph
2.1.1 Subsubtitle
Regex:
^(\d+(?:\.\d+)*\.?)\s+(.+)$
Rules:
- Section numbers must be hierarchical (no jump from 1 to 1.3)
- Title must not be empty
- Section numbering must be strictly increasing
- No duplicate section numbers
- Top-level sections may start with 0 or 1
- Top-level sections may include a trailing dot
- Nested sections do not require a trailing dot
The parser preserves the section number exactly as written in the source text.
Examples:
- "0. Core Principle" -> number = "0."
- "1. Encoding & File Rules" -> number = "1."
- "2.1 Title Paragraph" -> number = "2.1"
Section separation rules:
- A section heading always starts a new block
- Sections are separated by a blank line
- A section heading must not belong to a paragraph
- Content between section headings belongs exclusively to the current section
3.2 Paragraphs
Definition:
A paragraph is a block of consecutive non-empty lines separated by blank lines.
Rules:
- Paragraphs are separated by a blank line
- Consecutive non-empty lines belong to the same paragraph
- Paragraphs must not start with whitespace, except when the line belongs to a nested bullet list item
Line-break behavior:
- A line break inside a paragraph is preserved
- It does NOT create a new paragraph
- It does NOT introduce a new structural block
- It does NOT introduce explicit semantic structure
Practical interpretation:
- A paragraph is usually a single line
- Multi-line paragraphs are allowed but should be used intentionally
Renderer baseline behavior (without SIDECAR):
If a paragraph contains multiple lines:
The first line is rendered as visually emphasized. Remaining lines are rendered as continuation lines
If a paragraph contains a single line: It is rendered as normal text
This is a renderer heuristic, not a TEXT syntax rule.
3.3 Lists
3.3.1 Flat Lists
Bullet list:
- Item
- Item
Rules:
- Must start with "-" (dash + space)
- Bullet marker in source is always "- "
- Unicode en dash / em dash are not valid bullet markers in source
- List ends on first blank line or non-list line
Numbered list:
- Step
- Step
Rules:
- Must match pattern: ^\d+\)\s
- Numbers must be sequential inside list
- Numbered lists remain flat in v1.2
- Nested numbered lists are not supported in v1.2
3.3.2 Nested Bullet Lists
TEXTv1.2 keeps nested bullet lists.
A nested bullet list item is still introduced only by:
-
The nesting level is defined only by leading spaces.
Fixed indentation rule:
- Level 0 = 0 leading spaces
- Level 1 = 2 leading spaces
- Level 2 = 4 leading spaces
- Level 3 = 6 leading spaces
Rules:
- Maximum depth is 3 nested levels below level 0
- Only exact multiples of 2 spaces are allowed
- Tab characters are forbidden
- A list item may increase nesting by at most one level at a time
- A jump from level 0 directly to level 2 is invalid
- A jump from level 1 directly to level 3 is invalid
- De-dentation to any earlier valid level is allowed
- Mixed bullet markers are forbidden
- Nested structure is semantic only inside bullet lists
Examples:
Valid:
- Camera configuration
- Network settings
- Video settings
- Resolution
- Bitrate
- AI features
Valid:
- Item A
- Item A.1
- Item B
3.3.3 Nested List Scope
A nested bullet list is a single list structure.
Rules:
- A blank line ends the whole list
- A non-list line ends the whole list
- Paragraph continuation inside a list item is not supported in v1.2
- Child items belong to the nearest preceding item of the immediate parent level
3.3.4 AST Recommendation
A parser should preserve nested bullet structure explicitly.
Recommended shape:
- BulletList
- ListItem(text, children=[])
Each ListItem may contain zero or more child ListItem entries through a nested child list.
3.3.5 Renderer Baseline Behavior
Ordered list:
- All items are rendered as normal text
- No implicit emphasis is applied
Bullet list:
- If the list contains only level 0 items:
- All items are rendered as normal text
- If the list contains nested items:
- Level 0 items that have child items are rendered as visually emphasized
- Level 0 items without child items are rendered as normal text
- Nested child items are rendered as normal text
This behavior reflects structural hierarchy, not explicit syntax.
3.4 Indentation
- Leading spaces are not semantic in general text
- Indentation must not imply structure outside lists
- No tab characters
- Exceptions:
- Cover Block Meta/Comment lines (section 2.3)
- Nested bullet list indentation (section 3.3.2)
3.5 Page Breaks
No explicit page breaks in TEXT. Pagination is handled by the renderer based on section level.
3.6 URLs
URLs may appear as plain text inside paragraphs or list items.
Rules:
- TEXT does not require explicit link markup
- The parser may preserve URLs as plain text
- The renderer may detect and render URLs as external links
- URL detection must not change the source text content
Examples:
4. Explicitly Forbidden
- No HTML tags
- No Markdown syntax
- No inline styling hints
- No anchors or IDs
- No directives
- No embedded code fences
- No tab-based indentation
- No Unicode dash variants as bullet syntax
5. What TEXT Does NOT Contain
TEXT does not define:
- Images
- Code blocks
- Tables
- Warning / Note semantics
- Includes
- Cross-references
- Print rules
- Layout behavior
Admonition content may exist in TEXT as normal paragraphs and lists. The semantic reclassification of those blocks into NOTE / WARNING / IMPORTANT belongs to SIDECAR.
6. What TEXT Does Guarantee
If rules are followed, the parser can deterministically produce:
- Document (title, subtitle, description, metadata)
- Section (level, number, title)
- Paragraph (text)
- BulletList (items)
- Nested BulletList (items with children)
- OrderedList (flat items)
7. Philosophy
- TEXT = narrative flow
- SIDECAR = semantic structure
- THEME = visual design
Admonition bodies should remain in TEXT whenever possible. SIDECAR should identify and reclassify existing TEXT blocks rather than duplicating long user-visible content.
No layer leaks into another.
- Additional clarification:
- TEXT defines structure, not presentation
- Renderer may apply minimal deterministic heuristics
- SIDECAR defines explicit semantics
- THEME defines visual appearance
- Renderer interpretation must remain:
- deterministic
- minimal
- non-invasive
TEXT remains syntax-free but structurally strict.