Getting Started with bead

Stop asking “what data did we use?” five times a day. This guide will get you up and running with reproducible research in just a few minutes.

Installation

The easiest way to install bead is using pipx:

# Install pipx if you don't have it
python3 -m pip install --user pipx
python3 -m pipx ensurepath

# Install bead
pipx install https://github.com/e3krisztian/bead

Alternative Methods

If you prefer pip:

pip install --user https://github.com/e3krisztian/bead

For more installation options, see the full installation guide.

Verify Installation

$ bead --version
bead version 1.0.0

Your First bead

Let’s create a simple data analysis workflow to understand how bead works.

1. Create a New bead

$ bead new my-first-analysis
Created "my-first-analysis"

$ cd my-first-analysis
$ ls -la
drwxr-xr-x  .bead-meta/
drwxr-xr-x  input/
drwxr-xr-x  output/
drwxr-xr-x  temp/

2. Understanding the Directory Structure

bead creates these folders, each with a specific purpose:

  • input/ - Dependencies from other beads (read-only, managed by bead)
  • output/ - Files you want to share with downstream beads
  • temp/ - Temporary files (deleted when you save the bead)
  • .bead-meta/ - Internal metadata (don’t modify directly)
  • Everything else - Your code, documentation, and other files

3. Add Some Code

Create a simple analysis script in src/analyze.py:

#!/usr/bin/env python3
import pandas as pd

# Create sample data
data = pd.DataFrame({
    'x': range(10),
    'y': [i**2 for i in range(10)]
})

# Save results
data.to_csv('output/results.csv', index=False)
print("Analysis complete! Results saved to output/results.csv")
$ mkdir src
$ # Create src/analyze.py with the code above
$ chmod +x src/analyze.py

4. Run Your Analysis

$ python src/analyze.py
Analysis complete! Results saved to output/results.csv

$ ls output/
results.csv

5. Save Your bead

Create an immutable snapshot of your work:

# First, create a bead box (storage location)
$ bead box add my-beads ~/bead-storage
$ bead box list
Boxes:
-------------
my-beads: /Users/you/bead-storage

# Save your bead to the 'my-beads' box
$ bead save my-beads
Successfully stored bead at /Users/you/bead-storage/my-first-analysis_20250730T120000000000+0200.zip

Working with Dependencies

The real power of bead comes from linking computations together.

1. Create a Data Source bead

$ cd ..
$ bead new raw-data
$ cd raw-data

# Download some data
$ curl -o output/data.csv https://example.com/sample-data.csv

# Save it
$ bead save my-beads
$ cd ..

2. Use Data in Another bead

$ bead new data-processing
$ cd data-processing

# Add the raw data as a dependency
$ bead input add raw-data

# Check what was loaded
$ ls input/raw-data/
data.csv

# Create processing script
# process.py:
```python
import pandas as pd

# Load input data
df = pd.read_csv('input/raw-data/data.csv')

# Process it
df_cleaned = df.dropna()
df_cleaned['processed'] = True

# Save output
df_cleaned.to_csv('output/processed_data.csv', index=False)

$ python process.py $ bead save my-beads


## Best Practices

### 1. Clear Folder Usage

```bash
# ✅ Good: Source data in output/
echo "data" > output/dataset.csv

# ❌ Bad: Source data in temp/ (will be lost!)
echo "data" > temp/dataset.csv

# ✅ Good: Intermediate files in temp/
python preprocess.py > temp/intermediate.pkl
python analyze.py temp/intermediate.pkl > output/final.csv

2. Documentation

Always include a README in your output folder. Example output/README.md:

# Processed Customer Data

This dataset contains cleaned customer records from the 2024 survey.

## Files
- `processed_data.csv`: Clean customer data with outliers removed

## Processing Steps
1. Removed rows with missing customer IDs
2. Standardized date formats
3. Removed statistical outliers (>3 std dev)

Generated: 2025-07-30

3. Reproducible Environments

Include your software dependencies:

# For Python projects
$ pip freeze > requirements.txt

# For R projects  
$ R -e "sessionInfo()" > session-info.txt

# For conda environments
$ conda env export > environment.yml

Common Commands Reference

# Create and manage workspaces
bead new <name>              # Create new bead
bead develop <bead-ref>      # Open existing bead
bead develop -x <bead-ref>   # Open with output data
bead save <box>              # Save to bead box
bead zap                     # Delete workspace

# Manage dependencies
bead input add <name>        # Add and load dependency
bead input load <name>       # Load existing dependency
bead input update            # Update all dependencies
bead input unload <name>     # Free disk space

# Manage storage
bead box add <name> <path>   # Add storage location
bead box list                # List all boxes
bead box forget <name>       # Remove box reference

What’s Next?

Getting Help

Ready to make your research reproducible? Start creating beads!