Core Concepts

Understanding bead's fundamental principles and design

The bead Philosophy

Research workflows are complex and hard to reproduce. Data files get moved, code gets updated, team members leave, and suddenly results can’t be recreated. bead solves this with a simple approach: package everything needed to recreate a result into one self-contained unit.

This creates reproducible computational workflows without forcing you to change how you work.

The Fundamental Pattern

output = code(*inputs)

Every bead follows this pattern:

  • Inputs: The data you need (bead tracks exactly which version)
  • Code: Your scripts, notebooks, whatever (bead saves all of it)
  • Output: The results you created (bead packages it up nicely)

Key Concepts

1. Immutable Computational Snapshots

When you save a bead, it becomes an immutable archive containing:

  • All code needed to run the computation
  • References to exact input data versions
  • Generated output files
  • Metadata about creation time and dependencies

Think of it as a computational snapshot. You can always return to recreate the exact same results.

Who This Is Actually For

You’ll love bead if you:

  • Work with data and write code to analyze it
  • Have ever asked “what data did I use for this?”
  • Need to share analysis with teammates
  • Want to actually reproduce your own work months later
  • Use Python, R, Stata, Julia, shell scripts, or really anything
  • Care more about getting stuff done than learning new frameworks

You might want something else if you:

  • Just need to track code changes (use Git)
  • Want a workflow orchestrator (try Airflow or similar)
  • Need to manage software installations (use conda/Docker)
  • Are building web apps or mobile apps (this is for data analysis)

2. Workspace vs Archive

Workspace (Active Development)

  • Directory where you actively work on analysis
  • You can modify files, run code, test ideas
  • Temporary state during development

Archive (Saved bead)

  • Immutable .zip file stored in a bead box
  • Timestamped and content-verified
  • The permanent, shareable record of your computation
# Workspace: active development
$ cd my-analysis/
$ python analyze.py  # Modify and test

# Archive: frozen snapshot  
$ bead save results
# Creates: my-analysis_20250730T120000+0200.zip

3. Content-Based Verification

Every file in a bead has a cryptographic hash. When you load dependencies, bead verifies you have the exact same files that produced the original results.

$ bead input add processed-data
# bead verifies:
# - Correct version exists
# - Content matches hash
# - No corruption occurred

4. Directed Acyclic Graphs (DAGs)

beads form dependency graphs:

raw-data
    ↓
cleaning-step
    ↓
analysis → paper
    ↓
figures

Rules:

  • Dependencies flow in one direction
  • No circular dependencies allowed
  • Each node is independently reproducible

Design Principles

1. Immutability

Once saved, beads never change. New work creates new versions:

$ bead save results  # Creates v1: analysis_20250730T120000.zip
# Make changes...
$ bead save results  # Creates v2: analysis_20250730T130000.zip
# Both versions preserved forever

2. Explicit Dependencies

No hidden data sources or magic file paths:

# ❌ Bad: Hidden dependency
data = pd.read_csv("/shared/data/important.csv")

# ✅ Good: Explicit bead input
data = pd.read_csv("input/validated-data/important.csv")

3. Tool Agnosticism

bead doesn’t care about your tools:

  • Use any programming language
  • Use any data format
  • Use any execution method

bead only manages files and dependencies.

4. Human-Readable Archives

Even without bead installed, archives are usable:

$ unzip analysis_20250730T120000.zip
$ ls
code/       # Your source code files
data/       # Output data from your analysis
meta/       # bead metadata (bead, input.map, manifest)

The bead Lifecycle

1. Creation

$ bead new my-analysis
# Empty workspace with standard structure

2. Development

$ cd my-analysis
# Add code, load inputs, generate outputs
$ bead input add upstream-data
$ python process.py

3. Preservation

$ bead save results
# Immutable snapshot created

4. Sharing

# Copy .zip file to collaborator
# They can reproduce exactly
$ bead develop my-analysis_20250730T120000.zip

5. Building Upon

$ bead new follow-up
$ bead input add my-analysis
# Previous outputs become new inputs

What bead Is NOT

Understanding what bead doesn’t do is as important as what it does:

Not a Version Control System

  • bead tracks computational snapshots, not code evolution
  • Use Git for code versioning within beads
  • bead complements, doesn’t replace, traditional VCS

Not a Workflow Engine

  • bead doesn’t execute your code
  • No job scheduling or parallelization
  • You control execution, bead manages artifacts

Not a Data Store

  • bead manages references, not data hosting
  • No cloud storage or synchronization
  • You manage where bead boxes live

Not a Package Manager

  • bead doesn’t install software dependencies
  • Use conda, pip, or system packages
  • Document environment in your bead

Common Patterns

Source beads

No inputs, only outputs:

$ bead new survey-data
$ curl -o output/responses.csv https://survey.com/data
$ bead save raw-data

Processing beads

Transform inputs to outputs:

$ bead new clean-survey
$ bead input add survey-data
$ python clean.py input/survey-data/responses.csv output/clean.csv
$ bead save processed

Analysis beads

Final computations, often never closed:

$ bead new paper-figures
$ bead input add clean-survey
$ R --file=analyze.R
# May never 'bead save' - just share outputs directly

Best Practices

1. One Concept, One bead

  • Don’t pack unrelated computations together
  • Split complex pipelines into logical steps
  • Each bead should have a clear purpose

2. Document Everything

  • README in every output folder
  • Explain what the bead does
  • List any manual steps required

3. Save Frequently

  • After completing meaningful work
  • Before making major changes
  • When sharing with others

4. Use Descriptive Names

# ❌ Bad
bead new analysis
bead new data

# ✅ Good  
bead new customer-churn-model
bead new survey-2024-responses

Ready to dive deeper? Continue to Dependency Management to learn about building complex computational graphs.