Anonymous View
Skip to content

docs: convert reStructuredText sources to MyST markdown#1579

Open
timsaucer wants to merge 10 commits into
apache:mainfrom
timsaucer:doc/phase2-rst-to-md
Open

docs: convert reStructuredText sources to MyST markdown#1579
timsaucer wants to merge 10 commits into
apache:mainfrom
timsaucer:doc/phase2-rst-to-md

Conversation

@timsaucer

@timsaucer timsaucer commented Jun 5, 2026

Copy link
Copy Markdown
Member

Which issue does this PR close?

There is no open issue but this continues the work done in #1578.

Rationale for this change

Phase 2 of the documentation-site refresh started in #1578. With the modern pydata-sphinx-theme + navigation in place, this PR moves the content format off .rst and onto MyST .md. The motivation:

  • Markdown is the lingua franca of agent-tuned tooling. LLMs trained on GitHub and modern docs parse Markdown reliably; reStructuredText is a minority dialect that frequently confuses both humans editing via PR review and agents reading the source. The Apache datafusion-comet sibling project completed the same migration recently and reported smoother contributor onboarding.

What changes are included in this PR?

  • Format conversion (mechanical, via rst-to-myst).
  • Manual fixes layered on top of the converter output for cross references
  • AGENTS.md is updated so the two .rst paths called out under "Aggregate and Window Function Documentation" point at the new .md equivalents.

Are there any user-facing changes?

No behavioral change to the datafusion package — only the source format of the published documentation. Readers of the rendered site will not notice the migration; the HTML output is slightly updated but still shows all of the relevant content including running code.

Follow-ups (out of scope for this PR)

  • Phase 3: multi-version doc publishing (the comet pattern).
  • Phase 4: asf-site publishing workflow.

@timsaucer timsaucer force-pushed the doc/phase2-rst-to-md branch from a400ec1 to 67c2761 Compare June 7, 2026 13:20
@timsaucer timsaucer marked this pull request as draft June 7, 2026 13:29
timsaucer and others added 2 commits June 7, 2026 15:31
Phase 2 of the documentation-site refresh. Run `rst2myst convert` over
every human-authored .rst file under docs/source/ and remove the
originals. The result:

- 33 .rst files become 33 .md files (user guide, contributor guide,
  index, links).
- Headings, paragraphs, hyperlinks, code blocks, admonitions, and
  toctree directives all map cleanly to MyST syntax.
- Cross-reference anchors round-trip through MyST as `(label)=`
  blocks. The converter kebab-cased the labels (e.g. `(io-csv)=`),
  but every `{ref}` target in the corpus still uses the underscore
  form from the original RST (`{ref}\`CSV <io_csv>\``) and so do the
  Python docstrings that AutoAPI pulls in. Rewrite the anchors back
  to the underscore form so the existing references resolve.
- 86 `{eval-rst}` blocks remain — they all wrap `.. ipython::`
  directives, which have no first-class MyST equivalent. They render
  identically and don't block the build.

conf.py changes:

- Enable `colon_fence` and `deflist` MyST extensions (rst-to-myst
  emits these on a few files, particularly execution-metrics.md).
- Keep `.rst` in `source_suffix` even though no human-authored RST
  remains: sphinx-autoapi generates RST under autoapi/ at build time
  and Sphinx needs the suffix registered to parse it.

AGENTS.md: update the two .rst paths called out under "Aggregate and
Window Function Documentation" to point at the .md equivalents.

Verified by building locally — `build succeeded`, no warnings, all
internal cross-references resolve, the ipython examples on the
landing page and basics page still execute.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RST-to-MD conversion emitted MyST `%` comment syntax with blank line
between each header line, which renders as visible text. Replace with
canonical `<!--- ... -->` HTML comment block matching upstream
apache/datafusion and this repo's existing markdown files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@timsaucer timsaucer force-pushed the doc/phase2-rst-to-md branch from 026b9e5 to 30efd76 Compare June 7, 2026 13:37
timsaucer and others added 7 commits June 12, 2026 21:05
The RST -> MyST conversion left two intra-page links as undefined
reference-style links, which CommonMark renders as literal bracketed
text (no Sphinx warning, so the --fail-on-warning build still passed).
Point both at the auto-generated heading anchors instead.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Removes the last RST-syntax islands from the converted MyST markdown so
the docs are markdown-native for both human and LLM authors.

Executable examples (A): replace IPython.sphinxext.ipython_directive with
myst-nb. The 83 `{eval-rst}` + `.. ipython:: python` blocks become native
`{code-cell} ipython3` blocks, and the 14 pages that carry them gain
jupytext/kernelspec front matter so myst-nb runs them. conf.py routes .md
through myst-nb with nb_execution_mode="force" and
nb_execution_raise_on_error=True, so a failing example now fails the build.

myst-nb gives each page its own kernel instead of the IPython directive's
single namespace shared across all documents in build order. That isolation
surfaced expressions.md, which only ever worked by inheriting `col`/`lit`
from an earlier-built page — it now imports them itself. It also changes the
execution working directory to each page's own folder, so build.sh symlinks
the example data next to every page that reads it by relative name and
registers the python3 kernel; CI now calls build.sh so it matches local.

Tables (B): the 3 `.. list-table::` directives become GFM markdown tables.

Cross-references (C): the two intra-page links in distributing-work.md that
the conversion left as undefined markdown references (and that built green
while rendering literal brackets) become `{ref}` roles backed by explicit
`(label)=` targets, so a future break fails the build instead of shipping
silently.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
myst-nb prefers a cell's `_repr_html_` over its text repr. A datafusion
DataFrame's HTML repr is a Jupyter-oriented widget — inline styles plus an
injected <script> — that renders at the wrong width in the docs theme.

Set nb_mime_priority_overrides so the html builder prefers text/plain. The
35 cells that end in a bare DataFrame now show the same readable ASCII
table the old IPython directive produced, with no per-cell `.show()` edits
and no dependence on the package-generated HTML staying theme-compatible.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
apache/datafusion#21411 is resolved — `.alias()` now works directly on a
`grouping()` expression. Removed the note describing the limitation and the
with_column_renamed workaround in the rollup and grouping_sets examples,
aliasing the grouping columns inline instead. Verified on the current
branch: the aliased aggregates execute and produce the named columns.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The header logo was the same SVG in both color modes; the light-colored
wordmark was hard to read on the dark theme. Point the theme's image_dark
at a new original_dark.svg whose wordmark uses light strokes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The theme refresh emptied secondary_sidebar_items, dropping the
on-this-page table of contents that the previous site showed. Bring it
back on the right, wrapped in a native <details> so readers can fold it
away on the longer guide pages. Adds a custom page-toc-collapsible
secondary-sidebar template and styles the <summary> toggle (no JS).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up to restoring the on-this-page TOC: "collapsible" should hide the
entire right-hand frame, not just fold the list. Replace the <details>
wrapper with a floating toggle button (toc-toggle.js) that hides the whole
secondary sidebar via a body class; the flex article container then
reclaims the width (its 60em cap is lifted while hidden). The preference is
remembered across pages in localStorage, and the button is suppressed below
the theme's breakpoint where the sidebar is already collapsed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@timsaucer timsaucer marked this pull request as ready for review June 13, 2026 11:16
Adding the myst-nb docs stack pulled a newer typing-extensions only on
Python < 3.11, splitting it into two locked versions. Our own
`typing-extensions; python_full_version < '3.13'` dependency then spanned
that split, which uv recorded as a multi-version edge without a `version`
field — a form older uv builds (the one in CI's pinned setup-uv) reject
with "missing source field but has more than one matching package".

Add a [tool.uv] constraint-dependencies pin of typing-extensions>=4.15.0
so it resolves to a single version across all supported Pythons, removing
the fork and the under-specified edge. Relocked; uv lock --locked is clean
and no multi-version package has a marker-only edge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant