ibnezzoubayr/Skills

Fork 0

Files

T

ibnezzoubayr 661ad32394 First Commit: research-paper-presenter skill for hermes!

2026-06-19 21:47:51 +01:00

7.0 KiB

Raw Permalink Blame History

Research Paper → LibreOffice Impress Explainer

Purpose

Turn a research-paper PDF into a detailed, image-rich .odp presentation that explains the work clearly enough for a learner to understand everything — the problem, the core idea, the method, the math intuition, the results, and why it matters. Output is a native LibreOffice Impress file (.odp) — never .pptx, never PowerPoint.

When to use

The user gives you a paper (a .pdf, or an arXiv/URL they want explained) and wants to learn it via slides. Phrases like "make slides from this paper", "explain this paper", "turn this into a presentation", "deck I can study from".

What "good" looks like

Teaches, doesn't summarize. Every claim from the paper is unpacked into plain language with intuition first, formalism second. Assume the learner is smart but new to the subfield.
Image-rich. Most content slides carry a visual. Two image sources:
1. The paper's own figures, extracted automatically (highest fidelity — use these for the real architecture diagrams, result plots, tables).
2. Generated explainer visuals from gpt-image-2 — schematics, analogies, step-by-step diagrams, intuition pictures that the paper doesn't contain.
Detailed. A typical paper becomes 18–32 slides, not 8. Break the method into multiple slides. One idea per slide.
Coherent look. Generated images share a style so the deck feels designed.

Pipeline (run in order)

All scripts live in scripts/. Work inside a scratch dir, e.g. workdir/.

1 — Extract the paper

python3 scripts/extract_paper.py PAPER.pdf --out workdir

Produces workdir/paper_text.md (page-delimited text with detected section headers), workdir/figures/*.png (the paper's real figures), figures.json (manifest with page + dimensions), and meta.json (title, page/word counts).

2 — Read and understand

Read paper_text.md end to end. Build a mental model: What problem? What was broken before? What's the key insight? How does the method work mechanically? What do the experiments show? What are the limits? Do not start slides until you can explain the paper to a beginner without looking.

Inspect the extracted figures (workdir/figures/). Decide which are worth putting on slides directly (architecture diagrams, key result plots, tables).

3 — Plan the deck + write image prompts

Draft two files:

workdir/prompts.json — visuals to generate (see schema below). Write a prompt for each concept that benefits from a picture the paper lacks: the core analogy, a simplified mechanism diagram, before/after comparisons, a "how data flows" schematic, an intuition pump for the math. Aim for roughly one generated image per 1–2 content slides, on top of reused paper figures.
workdir/deck.json — the full deck spec (schema below). Reference generated images as <id>.png and reused paper figures by their path (figures/fig03.png).

4 — Generate images

export OPENAI_API_KEY=...      # Codex/Hermes usually has this in env
python3 scripts/generate_images.py workdir/prompts.json --assets workdir/assets

Writes one PNG per prompt as workdir/assets/<id>.png. Also copy any reused paper figures into the assets dir so everything resolves from one place:

cp workdir/figures/*.png workdir/assets/    # optional, keeps paths simple

If the Hermes runtime has native gpt-image-2 generation (Codex does), you may generate images directly and just save them as workdir/assets/<id>.png. The script is the portable fallback.

5 — Build the .odp

python3 scripts/build_odp.py workdir/deck.json workdir/output.odp --assets workdir/assets

That's the deliverable. It opens directly in LibreOffice Impress on Ubuntu. (Optional sanity check / PDF preview: libreoffice --headless --convert-to pdf workdir/output.odp.)

Deck spec schema (`deck.json`)

{
  "theme": "midnight",          // midnight | paper | forest
  "slides": [ /* slide objects, in order */ ]
}

Slide objects by type:

// Opening slide
{"type":"title","title":"...","subtitle":"...","eyebrow":"RESEARCH WALKTHROUGH",
 "meta":"Authors, year • one line", "notes":"speaker notes (optional)"}

// Section divider between major parts
{"type":"section","number":"02","title":"The Method","subtitle":"optional"}

// Workhorse slide: bullets, with an optional image on the right
{"type":"content","kicker":"motivation","title":"...",
 "bullets":["point","another point",{"text":"sub-point","level":1}],
 "image":"diagram_attention.png",     // omit for full-width text
 "caption":"Fig 2 — ...", "notes":"..."}

// Full-bleed image with a caption — use for big architecture diagrams / plots
{"type":"bigimage","kicker":"architecture","title":"...",
 "image":"figures/fig03.png","caption":"...","notes":"..."}

// Side-by-side comparison (before/after, baseline/proposed, RNN/Transformer)
{"type":"comparison","title":"...",
 "left":{"heading":"Baseline","bullets":["...","..."]},
 "right":{"heading":"Proposed","bullets":["...","..."]}}

// Pull-quote / key takeaway
{"type":"quote","text":"...","attribution":"paper abstract (paraphrased)"}

Notes:

bullets items are strings, or {"text": "...", "level": 1} for one indent.
image is resolved against --assets. A missing image renders a labelled placeholder (the deck still builds), so a failed generation never blocks you.
Put the deeper explanation a learner can read later into notes (speaker notes) — keep on-slide bullets tight.

Image prompts schema (`prompts.json`)

[
  {"id":"attention_schematic",
   "prompt":"Technical diagram: scaled dot-product attention. Show Q, K, V as "
            "labelled boxes, a matrix multiply, a softmax, and a weighted sum. "
            "Clear arrows and labels.",
   "shape":"landscape"},                 // landscape | portrait | square
  {"id":"rnn_bottleneck","prompt":"...","shape":"portrait",
   "transparent": false}
]

id becomes the filename (<id>.png) → reference it in deck.json.
gpt-image-2 renders text inside images well, so labelled diagrams, flowcharts and infographics are fair game — lean into them.
A shared style suffix is auto-appended for visual coherence; override per-run with --style-suffix.

Pedagogy checklist (the part that makes it a learning deck)

Open with the problem and stakes before any method.
For every mechanism: intuition / analogy first, then the precise version.
Turn each equation into a sentence ("this just measures how similar two vectors are, normalized so big dimensions don't blow up the scale").
Use comparison slides for "old way vs new way".
Reuse the paper's real result figures; explain what to look at in the caption.
End with: what's genuinely new, what it enables, and stated limitations.
Prefer more slides over crowded ones. One idea per slide.

Theme choice

midnight (dark, technical — default), paper (warm light, academic), forest (dark green). Pick one that fits the subject; keep it consistent.

7.0 KiB Raw Permalink Blame History Unescape Escape