19 Package Maintenance and Automation
Acknowlegements
Gemini code assist was active during the preparation of these notes, and some autosuggestions were incorporated into the final text.
Principles of Package Maintenance & Sharing
- Sharing code isn’t just about giving away software
- Advancing science
- Scientific reproducibility
- Building a resilient community
- Good maintenance spreads knowledge and lowers the entry barrier
The Bus Factor
- How many team members need to be hit by a bus (or win the lottery and quit) before the project completely stalls?
- A bus factor of 1 is dangerous
Components of a Good Repository
README.md- Audience: new users
- Project title and description
- Installation steps
- Basic usage examples
CONTRIBUTING.md- Audience: Advanced users who are interested in contributing
- Explain how to set up the dev environment locally
- Code style guidelines
- PR process
LICENSE- Without a license, code is completely copyrighted by default and technically illegal to use
- MIT: do whatever, don’t sue me
- GPL: if you modify and distribute, you must share your source
CHANGELOG.md- Audience: existing users
- Don’t rely purely on commit history
- Write human-readable logs organized by Added, Changed, Deprecated, Removed, Fixed, and Security
Versioning
MAJOR.MINOR.PATCH(e.g.v2.4.1)PATCH (
2.4.1->2.4.2)- Bug fixes
- If users update, nothing breaks
- MINOR (
2.4.1->2.5.0)- New features
- Fully backward compatible
- MAJOR (
2.4.1->3.0.0)- Breaking changes
- Users will need to update their own code to use this new version
GitHub Tools for Maintenance
- Issues
- Pull Requests (PR)
- Releases / Packages
Issue Tracking & Management
- Writing good bug reports
- Reproduction steps
- Expected behavior
- Actual Behavior
- “It’s broken” is not a good report issue
- Labels & Milestones
- Issue labels (
good first issuefor attracting beginners) - Milestones group issues together for a specific target release (e.g. “Version 2.0 Launch”)
- Issue templates can be a useful tool for getting more helpful bug reports from users
- Issue labels (

Pull Requests (PR)
For more than contributing to open source projects
Branching
- Don’t push directly to
main - Example branches:
fix/login-bugorfeature/dark-mode
- Don’t push directly to
- Code Reviews
- Code review is a conversation, not an attack
- Automation
- Example: writing
Closes #42in a PR description automatically closes Issue #42 when the PR is merged
- Example: writing
Releases & Packages
- Release
- A GitHub Release is a wrapper around a Git Tag
- It allows you to attach compiled binaries or release notes
- Package
- Source code (GitHub repo) is different from an installable package (e.g. PYPI, NPM)
- GitHub Packages can act as a private registry (e.g. to use with
npm)
Automating Maintenance with GitHub Actions
- What is CI/CD?
- Common maintenance workflows
Continuous Integration
- Automatically running tests and linters every time code is pushed
- Tests are included for each feature
- Boundary conditions are covered
- Each time a bug is fixed, add a new test to make sure it doesn’t come up again
- “Does this code break my package?”
Continuous Deployment
- Automatically publishing or deploying the code once CI passes
- Minor releases are frequent
GitHub Actions
Automation of tasks on GitHub
Workflows are defined in YAML files inside
.github/workflows/Events/Triggers
on: pushon: pull_requeston: schedulefor cron jobs
- Runners
- Virtual machines hosted by GitHub that execute your
- Many different architectures and operating systems are available
- Jobs & Steps
- A workflow has jobs which run in parallel unless there are dependencies
- Jobs have steps which run sequentially
Common Maintenance Workflows
Matrix Builds
- Running the exact same test suite on Ubuntu, Windows, and macOS simultaneously using a matrix strategy to catch OS-specific bugs
Compilation of code when changes are pushed (e.g. for CD)
Dependabot
- GitHub’s native security screener to automatically open PRs when your dependencies have security vulnerabilities or out-of-date versions
Docker
Dependencies can be a pain to manage, especially for less technical collaborators
- Global system dependencies
- Mismatched Python/Node versions
- OS differences
Virtual environments (like venv or npm) help, but they don’t capture OS-level dependencies (like C++ compilers or database drivers)
Containers
- Before shipping containers, loading a cargo ship took days of packing weirdly shaped items
- Standard containers mean standard cranes and standard ships
- Docker is standard packaging for code
Containers vs VMs
- VMs emulate the whole hardware and OS (heavy)
- Containers share the host OS kernel and only isolate the app and its libraries (lightweight and fast)
Docker Basics
- Images vs. Containers
- An Image is the recipe/blueprint
- A Container is the running instance of that recipe
Sample Dockerfile
FROM python:3.10-slim # Starts with a tiny Linux environment pre-loaded with Python 3.10
RUN apt-get update && \
apt-get install -y samtools # install samtools
WORKDIR /app # Creates a folder called `/app` inside the container and moves into it
COPY requirements.txt . # Copy your Python dependency definitions into the image
RUN pip install -r requirements.txt # Installs Python packages (e.g. biopython, pandas) during the build process
COPY . . # Copy your actual analysis scripts into the image
CMD ["python", "analyze_sequences.py"] # The default command that executes when the container starts.dockerignore
- Exclude
node_modules,.git, data and environment variable files (.env) - Helps keep images small
- Helps avoid security issues
Benefits for maintainers
Example: Reviewers can pull a PR, type docker compose up, and test a complex app with a database instantly, without installing a database or other dependencies on their local machine