Files
Skills/research-paper-presenter/SKILL.md
T

165 lines
7.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Research Paper → LibreOffice Impress Explainer
## Purpose
Turn a research-paper PDF into a **detailed, image-rich `.odp` presentation** that
explains the work clearly enough for a learner to understand everything — the
problem, the core idea, the method, the math intuition, the results, and why it
matters. Output is a **native LibreOffice Impress file (`.odp`)** — never `.pptx`,
never PowerPoint.
## When to use
The user gives you a paper (a `.pdf`, or an arXiv/URL they want explained) and
wants to *learn it* via slides. Phrases like "make slides from this paper",
"explain this paper", "turn this into a presentation", "deck I can study from".
## What "good" looks like
- **Teaches, doesn't summarize.** Every claim from the paper is unpacked into
plain language with intuition first, formalism second. Assume the learner is
smart but new to the subfield.
- **Image-rich.** Most content slides carry a visual. Two image sources:
1. **The paper's own figures**, extracted automatically (highest fidelity —
use these for the real architecture diagrams, result plots, tables).
2. **Generated explainer visuals** from `gpt-image-2` — schematics, analogies,
step-by-step diagrams, intuition pictures that the paper *doesn't* contain.
- **Detailed.** A typical paper becomes **1832 slides**, not 8. Break the method
into multiple slides. One idea per slide.
- **Coherent look.** Generated images share a style so the deck feels designed.
---
## Pipeline (run in order)
All scripts live in `scripts/`. Work inside a scratch dir, e.g. `workdir/`.
### 1 — Extract the paper
```bash
python3 scripts/extract_paper.py PAPER.pdf --out workdir
```
Produces `workdir/paper_text.md` (page-delimited text with detected section
headers), `workdir/figures/*.png` (the paper's real figures), `figures.json`
(manifest with page + dimensions), and `meta.json` (title, page/word counts).
### 2 — Read and understand
Read `paper_text.md` end to end. Build a mental model: What problem? What was
broken before? What's the key insight? How does the method work mechanically?
What do the experiments show? What are the limits? **Do not start slides until
you can explain the paper to a beginner without looking.**
Inspect the extracted figures (`workdir/figures/`). Decide which are worth putting
on slides directly (architecture diagrams, key result plots, tables).
### 3 — Plan the deck + write image prompts
Draft two files:
- `workdir/prompts.json` — visuals to generate (see schema below). Write a prompt
for each concept that benefits from a picture the paper lacks: the core
analogy, a simplified mechanism diagram, before/after comparisons, a "how data
flows" schematic, an intuition pump for the math. Aim for **roughly one
generated image per 12 content slides**, on top of reused paper figures.
- `workdir/deck.json` — the full deck spec (schema below). Reference generated
images as `<id>.png` and reused paper figures by their path
(`figures/fig03.png`).
### 4 — Generate images
```bash
export OPENAI_API_KEY=... # Codex/Hermes usually has this in env
python3 scripts/generate_images.py workdir/prompts.json --assets workdir/assets
```
Writes one PNG per prompt as `workdir/assets/<id>.png`. Also copy any reused
paper figures into the assets dir so everything resolves from one place:
```bash
cp workdir/figures/*.png workdir/assets/ # optional, keeps paths simple
```
> If the Hermes runtime has **native gpt-image-2** generation (Codex does), you
> may generate images directly and just save them as `workdir/assets/<id>.png`.
> The script is the portable fallback.
### 5 — Build the .odp
```bash
python3 scripts/build_odp.py workdir/deck.json workdir/output.odp --assets workdir/assets
```
That's the deliverable. It opens directly in LibreOffice Impress on Ubuntu.
(Optional sanity check / PDF preview:
`libreoffice --headless --convert-to pdf workdir/output.odp`.)
---
## Deck spec schema (`deck.json`)
```jsonc
{
"theme": "midnight", // midnight | paper | forest
"slides": [ /* slide objects, in order */ ]
}
```
Slide objects by `type`:
```jsonc
// Opening slide
{"type":"title","title":"...","subtitle":"...","eyebrow":"RESEARCH WALKTHROUGH",
"meta":"Authors, year • one line", "notes":"speaker notes (optional)"}
// Section divider between major parts
{"type":"section","number":"02","title":"The Method","subtitle":"optional"}
// Workhorse slide: bullets, with an optional image on the right
{"type":"content","kicker":"motivation","title":"...",
"bullets":["point","another point",{"text":"sub-point","level":1}],
"image":"diagram_attention.png", // omit for full-width text
"caption":"Fig 2 — ...", "notes":"..."}
// Full-bleed image with a caption — use for big architecture diagrams / plots
{"type":"bigimage","kicker":"architecture","title":"...",
"image":"figures/fig03.png","caption":"...","notes":"..."}
// Side-by-side comparison (before/after, baseline/proposed, RNN/Transformer)
{"type":"comparison","title":"...",
"left":{"heading":"Baseline","bullets":["...","..."]},
"right":{"heading":"Proposed","bullets":["...","..."]}}
// Pull-quote / key takeaway
{"type":"quote","text":"...","attribution":"paper abstract (paraphrased)"}
```
Notes:
- `bullets` items are strings, or `{"text": "...", "level": 1}` for one indent.
- `image` is resolved against `--assets`. A missing image renders a labelled
placeholder (the deck still builds), so a failed generation never blocks you.
- Put the deeper explanation a learner can read later into `notes` (speaker
notes) — keep on-slide bullets tight.
## Image prompts schema (`prompts.json`)
```jsonc
[
{"id":"attention_schematic",
"prompt":"Technical diagram: scaled dot-product attention. Show Q, K, V as "
"labelled boxes, a matrix multiply, a softmax, and a weighted sum. "
"Clear arrows and labels.",
"shape":"landscape"}, // landscape | portrait | square
{"id":"rnn_bottleneck","prompt":"...","shape":"portrait",
"transparent": false}
]
```
- `id` becomes the filename (`<id>.png`) → reference it in `deck.json`.
- gpt-image-2 renders **text inside images** well, so labelled diagrams,
flowcharts and infographics are fair game — lean into them.
- A shared style suffix is auto-appended for visual coherence; override per-run
with `--style-suffix`.
---
## Pedagogy checklist (the part that makes it a *learning* deck)
- Open with the **problem and stakes** before any method.
- For every mechanism: **intuition / analogy first**, then the precise version.
- Turn each equation into a sentence ("this just measures how similar two
vectors are, normalized so big dimensions don't blow up the scale").
- Use `comparison` slides for "old way vs new way".
- Reuse the paper's real result figures; explain *what to look at* in the caption.
- End with: what's genuinely new, what it enables, and stated limitations.
- Prefer more slides over crowded ones. One idea per slide.
## Theme choice
`midnight` (dark, technical — default), `paper` (warm light, academic),
`forest` (dark green). Pick one that fits the subject; keep it consistent.