165 lines
7.0 KiB
Markdown
165 lines
7.0 KiB
Markdown
# Research Paper → LibreOffice Impress Explainer
|
||
|
||
## Purpose
|
||
Turn a research-paper PDF into a **detailed, image-rich `.odp` presentation** that
|
||
explains the work clearly enough for a learner to understand everything — the
|
||
problem, the core idea, the method, the math intuition, the results, and why it
|
||
matters. Output is a **native LibreOffice Impress file (`.odp`)** — never `.pptx`,
|
||
never PowerPoint.
|
||
|
||
## When to use
|
||
The user gives you a paper (a `.pdf`, or an arXiv/URL they want explained) and
|
||
wants to *learn it* via slides. Phrases like "make slides from this paper",
|
||
"explain this paper", "turn this into a presentation", "deck I can study from".
|
||
|
||
## What "good" looks like
|
||
- **Teaches, doesn't summarize.** Every claim from the paper is unpacked into
|
||
plain language with intuition first, formalism second. Assume the learner is
|
||
smart but new to the subfield.
|
||
- **Image-rich.** Most content slides carry a visual. Two image sources:
|
||
1. **The paper's own figures**, extracted automatically (highest fidelity —
|
||
use these for the real architecture diagrams, result plots, tables).
|
||
2. **Generated explainer visuals** from `gpt-image-2` — schematics, analogies,
|
||
step-by-step diagrams, intuition pictures that the paper *doesn't* contain.
|
||
- **Detailed.** A typical paper becomes **18–32 slides**, not 8. Break the method
|
||
into multiple slides. One idea per slide.
|
||
- **Coherent look.** Generated images share a style so the deck feels designed.
|
||
|
||
---
|
||
|
||
## Pipeline (run in order)
|
||
|
||
All scripts live in `scripts/`. Work inside a scratch dir, e.g. `workdir/`.
|
||
|
||
### 1 — Extract the paper
|
||
```bash
|
||
python3 scripts/extract_paper.py PAPER.pdf --out workdir
|
||
```
|
||
Produces `workdir/paper_text.md` (page-delimited text with detected section
|
||
headers), `workdir/figures/*.png` (the paper's real figures), `figures.json`
|
||
(manifest with page + dimensions), and `meta.json` (title, page/word counts).
|
||
|
||
### 2 — Read and understand
|
||
Read `paper_text.md` end to end. Build a mental model: What problem? What was
|
||
broken before? What's the key insight? How does the method work mechanically?
|
||
What do the experiments show? What are the limits? **Do not start slides until
|
||
you can explain the paper to a beginner without looking.**
|
||
|
||
Inspect the extracted figures (`workdir/figures/`). Decide which are worth putting
|
||
on slides directly (architecture diagrams, key result plots, tables).
|
||
|
||
### 3 — Plan the deck + write image prompts
|
||
Draft two files:
|
||
|
||
- `workdir/prompts.json` — visuals to generate (see schema below). Write a prompt
|
||
for each concept that benefits from a picture the paper lacks: the core
|
||
analogy, a simplified mechanism diagram, before/after comparisons, a "how data
|
||
flows" schematic, an intuition pump for the math. Aim for **roughly one
|
||
generated image per 1–2 content slides**, on top of reused paper figures.
|
||
- `workdir/deck.json` — the full deck spec (schema below). Reference generated
|
||
images as `<id>.png` and reused paper figures by their path
|
||
(`figures/fig03.png`).
|
||
|
||
### 4 — Generate images
|
||
```bash
|
||
export OPENAI_API_KEY=... # Codex/Hermes usually has this in env
|
||
python3 scripts/generate_images.py workdir/prompts.json --assets workdir/assets
|
||
```
|
||
Writes one PNG per prompt as `workdir/assets/<id>.png`. Also copy any reused
|
||
paper figures into the assets dir so everything resolves from one place:
|
||
```bash
|
||
cp workdir/figures/*.png workdir/assets/ # optional, keeps paths simple
|
||
```
|
||
> If the Hermes runtime has **native gpt-image-2** generation (Codex does), you
|
||
> may generate images directly and just save them as `workdir/assets/<id>.png`.
|
||
> The script is the portable fallback.
|
||
|
||
### 5 — Build the .odp
|
||
```bash
|
||
python3 scripts/build_odp.py workdir/deck.json workdir/output.odp --assets workdir/assets
|
||
```
|
||
That's the deliverable. It opens directly in LibreOffice Impress on Ubuntu.
|
||
(Optional sanity check / PDF preview:
|
||
`libreoffice --headless --convert-to pdf workdir/output.odp`.)
|
||
|
||
---
|
||
|
||
## Deck spec schema (`deck.json`)
|
||
|
||
```jsonc
|
||
{
|
||
"theme": "midnight", // midnight | paper | forest
|
||
"slides": [ /* slide objects, in order */ ]
|
||
}
|
||
```
|
||
|
||
Slide objects by `type`:
|
||
|
||
```jsonc
|
||
// Opening slide
|
||
{"type":"title","title":"...","subtitle":"...","eyebrow":"RESEARCH WALKTHROUGH",
|
||
"meta":"Authors, year • one line", "notes":"speaker notes (optional)"}
|
||
|
||
// Section divider between major parts
|
||
{"type":"section","number":"02","title":"The Method","subtitle":"optional"}
|
||
|
||
// Workhorse slide: bullets, with an optional image on the right
|
||
{"type":"content","kicker":"motivation","title":"...",
|
||
"bullets":["point","another point",{"text":"sub-point","level":1}],
|
||
"image":"diagram_attention.png", // omit for full-width text
|
||
"caption":"Fig 2 — ...", "notes":"..."}
|
||
|
||
// Full-bleed image with a caption — use for big architecture diagrams / plots
|
||
{"type":"bigimage","kicker":"architecture","title":"...",
|
||
"image":"figures/fig03.png","caption":"...","notes":"..."}
|
||
|
||
// Side-by-side comparison (before/after, baseline/proposed, RNN/Transformer)
|
||
{"type":"comparison","title":"...",
|
||
"left":{"heading":"Baseline","bullets":["...","..."]},
|
||
"right":{"heading":"Proposed","bullets":["...","..."]}}
|
||
|
||
// Pull-quote / key takeaway
|
||
{"type":"quote","text":"...","attribution":"paper abstract (paraphrased)"}
|
||
```
|
||
|
||
Notes:
|
||
- `bullets` items are strings, or `{"text": "...", "level": 1}` for one indent.
|
||
- `image` is resolved against `--assets`. A missing image renders a labelled
|
||
placeholder (the deck still builds), so a failed generation never blocks you.
|
||
- Put the deeper explanation a learner can read later into `notes` (speaker
|
||
notes) — keep on-slide bullets tight.
|
||
|
||
## Image prompts schema (`prompts.json`)
|
||
```jsonc
|
||
[
|
||
{"id":"attention_schematic",
|
||
"prompt":"Technical diagram: scaled dot-product attention. Show Q, K, V as "
|
||
"labelled boxes, a matrix multiply, a softmax, and a weighted sum. "
|
||
"Clear arrows and labels.",
|
||
"shape":"landscape"}, // landscape | portrait | square
|
||
{"id":"rnn_bottleneck","prompt":"...","shape":"portrait",
|
||
"transparent": false}
|
||
]
|
||
```
|
||
- `id` becomes the filename (`<id>.png`) → reference it in `deck.json`.
|
||
- gpt-image-2 renders **text inside images** well, so labelled diagrams,
|
||
flowcharts and infographics are fair game — lean into them.
|
||
- A shared style suffix is auto-appended for visual coherence; override per-run
|
||
with `--style-suffix`.
|
||
|
||
---
|
||
|
||
## Pedagogy checklist (the part that makes it a *learning* deck)
|
||
- Open with the **problem and stakes** before any method.
|
||
- For every mechanism: **intuition / analogy first**, then the precise version.
|
||
- Turn each equation into a sentence ("this just measures how similar two
|
||
vectors are, normalized so big dimensions don't blow up the scale").
|
||
- Use `comparison` slides for "old way vs new way".
|
||
- Reuse the paper's real result figures; explain *what to look at* in the caption.
|
||
- End with: what's genuinely new, what it enables, and stated limitations.
|
||
- Prefer more slides over crowded ones. One idea per slide.
|
||
|
||
## Theme choice
|
||
`midnight` (dark, technical — default), `paper` (warm light, academic),
|
||
`forest` (dark green). Pick one that fits the subject; keep it consistent.
|