The Complete Guide to Markdown
Markdown is the lingua franca of developer documentation. This guide covers where it came from, why there are so many flavors, what happens when you convert it to HTML, and the security implications of rendering it.
Markdown's origin
John Gruber and Aaron Swartz created Markdown in 2004 with a simple thesis: plain text should be readable as-is and convert cleanly to valid HTML. The syntax wasn't invented from scratch — it was modeled on how people already formatted plain-text email. Asterisks for emphasis, dashes for lists, blank lines for paragraphs. If you'd ever written a text-only email, you already knew most of Markdown.
Gruber published a Perl script that handled the conversion, plus a syntax description that served as an informal spec. It was intentionally loose — covering the common cases and leaving edge cases undefined. That decision would cause problems later.
CommonMark: the spec that brought order
The original Markdown description was ambiguous. What happens when you nest a blockquote inside a list? When a paragraph and a code block share the same indentation level? Different parsers gave different answers. The same Markdown could produce different HTML depending on which tool processed it.
CommonMark launched in 2014 to fix this. It's a formal specification with over 600 test cases that define exact behavior for every edge case. Most modern parsers — including marked, markdown-it, and remark — follow CommonMark or a close superset of it.
GitHub Flavored Markdown (GFM) extends CommonMark with features GitHub needed:
| Column A | Column B |
| -------- | -------- |
| Tables | work |
- [x] Task lists
- [ ] with checkboxes
~~Strikethrough~~ and autolinked URLs: https://example.com
If you're writing for GitHub, you're writing GFM whether you know it or not.
Markdown flavors
Not all Markdown is the same. Knowing which flavor your platform expects saves debugging time.
| Flavor | Notable additions | Used by |
|---|---|---|
| CommonMark | Base spec, strict parsing rules | Most modern parsers |
| GFM | Tables, task lists, strikethrough, autolinks | GitHub, GitLab |
| MDX | JSX components inline: <Chart data={sales} /> |
Docusaurus, Next.js, Astro |
| MultiMarkdown | Footnotes, citations, metadata | Academic writing |
| Pandoc's Markdown | Citations, figure captions, TeX math | Academic papers, eBooks |
A - [x] Done checkbox renders on GitHub but shows as literal text in a basic CommonMark parser. Always check what your target platform supports.
Markdown to HTML: what's actually happening
When you convert Markdown to HTML, three things happen under the hood:
- Tokenization — the parser scans the input and identifies block-level elements (headings, paragraphs, lists, code blocks) and inline elements (bold, italic, links, code spans)
- AST construction — tokens are organized into an abstract syntax tree that represents the document structure
- Serialization — the AST is walked and each node is rendered as its HTML equivalent
This tool uses marked, a fast CommonMark-compatible parser written in JavaScript. Other popular parsers include markdown-it (plugin-based, extensible), remark (part of the unified ecosystem, works with ASTs), and micromark (small, spec-compliant).
Where Markdown is used
Markdown shows up everywhere in a developer's workflow:
- README files — the first thing people see in a repository
- Documentation sites — Docusaurus, VitePress, Astro, MkDocs, and Jekyll all consume Markdown
- Static site generators — blog posts, content pages, changelogs
- Note-taking apps — Obsidian, Notion, Bear, Logseq
- Comments and discussions — GitHub issues, pull requests, Reddit, Stack Overflow, Discord
- CMS content — headless CMSs like Contentful and Sanity support Markdown fields
If you write code, you write Markdown. It's unavoidable.
Markdown security
Raw Markdown can contain arbitrary HTML. That's a feature — Gruber's original spec explicitly allows it. But if you render user-submitted Markdown without sanitization, you're opening yourself to XSS:
Click here: <img src=x onerror="alert(document.cookie)">
A naive Markdown-to-HTML pipeline passes that straight through. The browser executes the onerror handler, and you have a vulnerability.
Always sanitize rendered Markdown when the source is untrusted. Use DOMPurify on the client or sanitize-html on the server. GitHub, GitLab, and every major platform strip <script>, <style>, event handlers, and dangerous attributes before rendering.
Troubleshooting
Line breaks aren't rendering — Markdown treats a single newline as a space, not a <br>. Either add two trailing spaces at the end of a line, use an explicit <br> tag, or leave a blank line between paragraphs.
Nested lists aren't indenting correctly — Indent nested items by 2 or 4 spaces (be consistent). Also make sure there's a blank line before the first list item — some parsers require it for proper list detection.
HTML in Markdown isn't appearing — Most parsers pass HTML through, but platforms like GitHub sanitize it. Tags like <script>, <style>, <iframe>, and event handler attributes (onclick, onerror) are stripped. If your HTML disappears, the platform is filtering it for security.
Tables aren't rendering — Tables require a header separator row (| --- | --- |) and are a GFM extension, not part of base CommonMark. If your parser only supports CommonMark, tables will render as plain text. Check that your parser or platform supports GFM.