18 Streamlit
Why Streamlit?
We build powerful computational pipelines (sequence analysis, single-cell clustering, variant calling), but sharing them is difficult
Options:
- GitHub repo containing scripts
- R/Python package
- Docker
- Shiny/Streamlit web app
- Some of these options are less-well suited for
- Wet-lab biologists
- Clinicians
- Collaborators who are uncomfortable with command-line or Python/R environment setups
- Full web development (Flask/Django/React) is typically impractical
- Requires learning HTML, CSS, JavaScript
- Massive time investment for development
- Non-trivial maintenance requirements
Intro to Streamlit
An open-source Python library designed specifically for machine learning and data science teams
Turns Python data scripts into shareable, interactive web applications
Pure Python: no front-end languages required
Rapid Prototyping: build a UI as fast as you can write a script
Interactivity: create sliders, file uploaders, and buttons to explore biological data dynamically
Streamlit vs. Shiny
Shiny
- Natively supports both R and Python
- Uses a reactive programming model
- Explicitly separates the UI layout from the server
- Pros
- Control over complex app states
- Highly customizable UI
- Deep integration with the R ecosystem (a massive advantage for Bioconductor/Seurat users)
- Cons
- Steeper learning curve
- Writing reactive graphs can become complicated
Streamlit
- Top-to-bottom execution
- UI and backend are blended into a single, linear script
- Pros:
- Shallow learning curve
- Ideal for quickly wrapping Python-based machine learning models, Biopython scripts, or Scanpy pipelines
- Cons:
- Less fine-grained control over complex UI layouts and complex reactive states compared to Shiny
How Streamlit Works
Streamlit runs your script from the first line to the last line every time a user interacts with the app
When a user adjusts a widget (e.g. changes a p-value slider), the script re-runs with the new variable value
To prevent reloading big datasets on every click, Streamlit uses caching mechanisms (more on that later)
Streamlit Basics & Live Demo
Install the standard Python package
pip install streamlit
You don’t run it with
python app.py; you use the Streamlit CLIstreamlit run app.py
Text and Documentation elements
Streamlit makes it easy to document your pipelines right in the app
Supports standard Markdown
Text elements
st.title: Add a title to the appst.header: Add a section headerst.subheader: Add a sub-section headerst.markdown: Add a block of static Markdown textst.write: Add something to the app- Handles almost any data type automatically (e.g. Pandas DataFrames, figures, etc.)
- Accepts multiple arguments (e.g.
st.write("The calculated p-value is **", p_value, "** which is significant."))
import streamlit as st
import pandas as pd # we'll use this later
import time # we'll use this later
# Live Demo: Text Elements
st.title("🧬 Variant Filtering Pipeline")
st.header("Step 1: Quality Control")
st.write("This pipeline accepts VCF files and filters based on user-defined criteria.")
st.markdown(
"""
We can include a large, static block of markdown formatted text here. See these [Streamlit](https://docs.streamlit.io/) and [Markdown](https://www.markdownguide.org/) documentation pages for more information.
**Note:** Ensure your VCF is compressed (`.vcf.gz`).
""")Displaying Biological Data
Streamlit integrates natively with Pandas
DataFrames are rendered as interactive, sortable tables
# Live Demo: Displaying Data
st.subheader("Sample Metadata")
# Create a mock dataframe for demonstration
data = {
'Sample_ID': ['SRR123', 'SRR124', 'SRR125'],
'Condition': ['Wildtype', 'Knockout', 'Knockout'],
'Read_Count': [1500000, 1200000, 1800000]
}
df = pd.DataFrame(data)
st.write("### Display with `st.dataframe:`")
st.dataframe(df)
st.write("### Display with `st.write:`")
st.write(df)Interactive Widgets
Widgets are assigned directly to variables
No need to write callbacks or event listeners
st.subheader("Filter Parameters")
# Live Demo: Widgets
# Text Input for sequences
motif = st.text_input("Enter binding motif:", "ATGC")
# Slider for thresholds
p_value = st.slider("Select Max P-Value:", min_value=0.01, max_value=0.10, value=0.05, step=0.01)
# File Uploader
uploaded_file = st.file_uploader("Upload FASTA file", type=["fasta", "fa"])
st.write(f"Filtering for motif **{motif}** with p-value < **{p_value}**.")Layouts & Optimization
Columns and Containers
Use
st.columns()to place charts, tables, or metrics side-by-side- Excellent for comparing control vs. experimental groups or displaying multiple KPIs
Use
st.metric()to display statistics (especially KPIs); required parameters arelabel: name of the metricvalue: value of the metric/statistic
- Optional
st.metric()parameters includedelta: how has the metric changed?delta_color: one of three values controling the color of the deta arrow and text"normal"(default): Positive deltas are green, negative deltas are red"inverse": Positive deltas are red, negative deltas are green"off": The delta is displayed in gray, implying the change is neutral
# Live Demo: Columns
st.subheader("Experiment Summary")
# Create three equal-width columns
col1, col2, col3 = st.columns(3)
with col1:
st.metric(label="Total Samples", value="42")
with col2:
st.metric(label="Variants Found", value="1,432", delta="120")
with col3:
st.metric(label="Failed QC", value="2", delta="-1", delta_color="inverse")Performance & Optimization
Streamlit reruns the whole script when a widget changes
If loading a 2GB
.csvtakes 10 seconds, your app will lag every time a slider is movedDecorators help with this
@st.cache_datatells Streamlit to run a function once, store the output in memory, and skip the computation on subsequent reruns unless the inputs change
st.subheader("Data Loading & Caching")
# Live Demo: Caching
#@st.cache_data # try with and without this decorator
def load_massive_dataset():
# Simulating a slow data load (e.g., parsing a large TSV)
time.sleep(3)
return pd.DataFrame({"Gene": ["BRCA1", "TP53", "EGFR"], "Expression": [12.5, 8.2, 15.1]})
# This will take 3 seconds the first time, and 0 seconds on reruns
df = load_massive_dataset()
st.dataframe(df)