18 Streamlit

Author

Affiliation

Dr Randy Johnson

Hood College

Published

April 8, 2026

Why Streamlit?

We build powerful computational pipelines (sequence analysis, single-cell clustering, variant calling), but sharing them is difficult
Options:
- GitHub repo containing scripts
- R/Python package
- Docker
- Shiny/Streamlit web app

Some of these options are less-well suited for
- Wet-lab biologists
- Clinicians
- Collaborators who are uncomfortable with command-line or Python/R environment setups

Full web development (Flask/Django/React) is typically impractical
- Requires learning HTML, CSS, JavaScript
- Massive time investment for development
- Non-trivial maintenance requirements

Intro to Streamlit

An open-source Python library designed specifically for machine learning and data science teams
Turns Python data scripts into shareable, interactive web applications
- Pure Python: no front-end languages required
- Rapid Prototyping: build a UI as fast as you can write a script
- Interactivity: create sliders, file uploaders, and buttons to explore biological data dynamically

Streamlit vs. Shiny

Shiny

Natively supports both R and Python
Uses a reactive programming model
Explicitly separates the UI layout from the server

Pros
- Control over complex app states
- Highly customizable UI
- Deep integration with the R ecosystem (a massive advantage for Bioconductor/Seurat users)
Cons
- Steeper learning curve
- Writing reactive graphs can become complicated

Streamlit

Top-to-bottom execution
UI and backend are blended into a single, linear script

Pros:
- Shallow learning curve
- Ideal for quickly wrapping Python-based machine learning models, Biopython scripts, or Scanpy pipelines
Cons:
- Less fine-grained control over complex UI layouts and complex reactive states compared to Shiny

How Streamlit Works

Streamlit runs your script from the first line to the last line every time a user interacts with the app
When a user adjusts a widget (e.g. changes a p-value slider), the script re-runs with the new variable value
To prevent reloading big datasets on every click, Streamlit uses caching mechanisms (more on that later)

Streamlit Basics & Live Demo

Install the standard Python package
- pip install streamlit
You don’t run it with python app.py; you use the Streamlit CLI
- streamlit run app.py

Text and Documentation elements

Streamlit makes it easy to document your pipelines right in the app
Supports standard Markdown

Text elements

st.title: Add a title to the app
st.header: Add a section header
st.subheader: Add a sub-section header
st.markdown: Add a block of static Markdown text
st.write: Add something to the app
- Handles almost any data type automatically (e.g. Pandas DataFrames, figures, etc.)
- Accepts multiple arguments (e.g. st.write("The calculated p-value is **", p_value, "** which is significant."))

import streamlit as st
import pandas as pd # we'll use this later
import time         # we'll use this later

# Live Demo: Text Elements
st.title("🧬 Variant Filtering Pipeline")
st.header("Step 1: Quality Control")
st.write("This pipeline accepts VCF files and filters based on user-defined criteria.")
st.markdown(
    """
    We can include a large, static block of markdown formatted text here. See these [Streamlit](https://docs.streamlit.io/) and [Markdown](https://www.markdownguide.org/) documentation pages for more information.

    **Note:** Ensure your VCF is compressed (`.vcf.gz`).
    """)

Displaying Biological Data

Streamlit integrates natively with Pandas
DataFrames are rendered as interactive, sortable tables

# Live Demo: Displaying Data
st.subheader("Sample Metadata")

# Create a mock dataframe for demonstration
data = {
    'Sample_ID': ['SRR123', 'SRR124', 'SRR125'],
    'Condition': ['Wildtype', 'Knockout', 'Knockout'],
    'Read_Count': [1500000, 1200000, 1800000]
}
df = pd.DataFrame(data)

st.write("### Display with `st.dataframe:`")
st.dataframe(df)

st.write("### Display with `st.write:`")
st.write(df)

Interactive Widgets

Widgets are assigned directly to variables
No need to write callbacks or event listeners

st.subheader("Filter Parameters")

# Live Demo: Widgets
# Text Input for sequences
motif = st.text_input("Enter binding motif:", "ATGC")

# Slider for thresholds
p_value = st.slider("Select Max P-Value:", min_value=0.01, max_value=0.10, value=0.05, step=0.01)

# File Uploader
uploaded_file = st.file_uploader("Upload FASTA file", type=["fasta", "fa"])

st.write(f"Filtering for motif **{motif}** with p-value < **{p_value}**.")

Layouts & Optimization

Structuring the UI with Sidebars

A cluttered app is hard to use
Keep parameters separate from results
st.sidebar moves widgets to an expandable left-hand menu

# Live Demo: Sidebars
st.sidebar.header("Pipeline Settings")

# Moving widgets to the sidebar
organism = st.sidebar.selectbox(
    "Select Organism", 
    ["Homo sapiens", "Mus musculus", "Drosophila melanogaster"]
)
min_depth = st.sidebar.number_input("Minimum Read Depth", value=10)

st.title("Main Results Canvas")
st.write("Analyzing data for: ", organism)

Columns and Containers

Use st.columns() to place charts, tables, or metrics side-by-side
- Excellent for comparing control vs. experimental groups or displaying multiple KPIs
Use st.metric() to display statistics (especially KPIs); required parameters are
- label: name of the metric
- value: value of the metric/statistic

Optional st.metric() parameters include
- delta: how has the metric changed?
- delta_color: one of three values controling the color of the deta arrow and text
  - "normal" (default): Positive deltas are green, negative deltas are red
  - "inverse": Positive deltas are red, negative deltas are green
  - "off": The delta is displayed in gray, implying the change is neutral

# Live Demo: Columns
st.subheader("Experiment Summary")

# Create three equal-width columns
col1, col2, col3 = st.columns(3)

with col1:
    st.metric(label="Total Samples", value="42")
with col2:
    st.metric(label="Variants Found", value="1,432", delta="120")
with col3:
    st.metric(label="Failed QC", value="2", delta="-1", delta_color="inverse")

Performance & Optimization

Streamlit reruns the whole script when a widget changes
If loading a 2GB .csv takes 10 seconds, your app will lag every time a slider is moved
Decorators help with this
- @st.cache_data tells Streamlit to run a function once, store the output in memory, and skip the computation on subsequent reruns unless the inputs change

st.subheader("Data Loading & Caching")

# Live Demo: Caching
#@st.cache_data # try with and without this decorator
def load_massive_dataset():
    # Simulating a slow data load (e.g., parsing a large TSV)
    time.sleep(3) 
    return pd.DataFrame({"Gene": ["BRCA1", "TP53", "EGFR"], "Expression": [12.5, 8.2, 15.1]})

# This will take 3 seconds the first time, and 0 seconds on reruns
df = load_massive_dataset()
st.dataframe(df)