18  Streamlit

Author
Affiliation

Dr Randy Johnson

Hood College

Published

April 8, 2026

Why Streamlit?

  • We build powerful computational pipelines (sequence analysis, single-cell clustering, variant calling), but sharing them is difficult

  • Options:

    • GitHub repo containing scripts
    • R/Python package
    • Docker
    • Shiny/Streamlit web app
  • Some of these options are less-well suited for
    • Wet-lab biologists
    • Clinicians
    • Collaborators who are uncomfortable with command-line or Python/R environment setups
  • Full web development (Flask/Django/React) is typically impractical
    • Requires learning HTML, CSS, JavaScript
    • Massive time investment for development
    • Non-trivial maintenance requirements

Intro to Streamlit

  • An open-source Python library designed specifically for machine learning and data science teams

  • Turns Python data scripts into shareable, interactive web applications

    • Pure Python: no front-end languages required

    • Rapid Prototyping: build a UI as fast as you can write a script

    • Interactivity: create sliders, file uploaders, and buttons to explore biological data dynamically

Streamlit vs. Shiny

Shiny

  • Natively supports both R and Python
  • Uses a reactive programming model
  • Explicitly separates the UI layout from the server
  • Pros
    • Control over complex app states
    • Highly customizable UI
    • Deep integration with the R ecosystem (a massive advantage for Bioconductor/Seurat users)
  • Cons
    • Steeper learning curve
    • Writing reactive graphs can become complicated

Streamlit

  • Top-to-bottom execution
  • UI and backend are blended into a single, linear script
  • Pros:
    • Shallow learning curve
    • Ideal for quickly wrapping Python-based machine learning models, Biopython scripts, or Scanpy pipelines
  • Cons:
    • Less fine-grained control over complex UI layouts and complex reactive states compared to Shiny

How Streamlit Works

  • Streamlit runs your script from the first line to the last line every time a user interacts with the app

  • When a user adjusts a widget (e.g. changes a p-value slider), the script re-runs with the new variable value

  • To prevent reloading big datasets on every click, Streamlit uses caching mechanisms (more on that later)

Streamlit Basics & Live Demo

  • Install the standard Python package

    • pip install streamlit
  • You don’t run it with python app.py; you use the Streamlit CLI

    • streamlit run app.py

Text and Documentation elements

  • Streamlit makes it easy to document your pipelines right in the app

  • Supports standard Markdown

Text elements

  • st.title: Add a title to the app
  • st.header: Add a section header
  • st.subheader: Add a sub-section header
  • st.markdown: Add a block of static Markdown text
  • st.write: Add something to the app
    • Handles almost any data type automatically (e.g. Pandas DataFrames, figures, etc.)
    • Accepts multiple arguments (e.g. st.write("The calculated p-value is **", p_value, "** which is significant."))
import streamlit as st
import pandas as pd # we'll use this later
import time         # we'll use this later

# Live Demo: Text Elements
st.title("🧬 Variant Filtering Pipeline")
st.header("Step 1: Quality Control")
st.write("This pipeline accepts VCF files and filters based on user-defined criteria.")
st.markdown(
    """
    We can include a large, static block of markdown formatted text here. See these [Streamlit](https://docs.streamlit.io/) and [Markdown](https://www.markdownguide.org/) documentation pages for more information.

    **Note:** Ensure your VCF is compressed (`.vcf.gz`).
    """)

Displaying Biological Data

  • Streamlit integrates natively with Pandas

  • DataFrames are rendered as interactive, sortable tables

# Live Demo: Displaying Data
st.subheader("Sample Metadata")

# Create a mock dataframe for demonstration
data = {
    'Sample_ID': ['SRR123', 'SRR124', 'SRR125'],
    'Condition': ['Wildtype', 'Knockout', 'Knockout'],
    'Read_Count': [1500000, 1200000, 1800000]
}
df = pd.DataFrame(data)

st.write("### Display with `st.dataframe:`")
st.dataframe(df)

st.write("### Display with `st.write:`")
st.write(df)

Interactive Widgets

  • Widgets are assigned directly to variables

  • No need to write callbacks or event listeners

st.subheader("Filter Parameters")

# Live Demo: Widgets
# Text Input for sequences
motif = st.text_input("Enter binding motif:", "ATGC")

# Slider for thresholds
p_value = st.slider("Select Max P-Value:", min_value=0.01, max_value=0.10, value=0.05, step=0.01)

# File Uploader
uploaded_file = st.file_uploader("Upload FASTA file", type=["fasta", "fa"])

st.write(f"Filtering for motif **{motif}** with p-value < **{p_value}**.")

Layouts & Optimization

Structuring the UI with Sidebars

  • A cluttered app is hard to use
  • Keep parameters separate from results
  • st.sidebar moves widgets to an expandable left-hand menu
# Live Demo: Sidebars
st.sidebar.header("Pipeline Settings")

# Moving widgets to the sidebar
organism = st.sidebar.selectbox(
    "Select Organism", 
    ["Homo sapiens", "Mus musculus", "Drosophila melanogaster"]
)
min_depth = st.sidebar.number_input("Minimum Read Depth", value=10)

st.title("Main Results Canvas")
st.write("Analyzing data for: ", organism)

Columns and Containers

  • Use st.columns() to place charts, tables, or metrics side-by-side

    • Excellent for comparing control vs. experimental groups or displaying multiple KPIs
  • Use st.metric() to display statistics (especially KPIs); required parameters are

    • label: name of the metric
    • value: value of the metric/statistic
  • Optional st.metric() parameters include
    • delta: how has the metric changed?
    • delta_color: one of three values controling the color of the deta arrow and text
      • "normal" (default): Positive deltas are green, negative deltas are red
      • "inverse": Positive deltas are red, negative deltas are green
      • "off": The delta is displayed in gray, implying the change is neutral
# Live Demo: Columns
st.subheader("Experiment Summary")

# Create three equal-width columns
col1, col2, col3 = st.columns(3)

with col1:
    st.metric(label="Total Samples", value="42")
with col2:
    st.metric(label="Variants Found", value="1,432", delta="120")
with col3:
    st.metric(label="Failed QC", value="2", delta="-1", delta_color="inverse")

Performance & Optimization

  • Streamlit reruns the whole script when a widget changes

  • If loading a 2GB .csv takes 10 seconds, your app will lag every time a slider is moved

  • Decorators help with this

    • @st.cache_data tells Streamlit to run a function once, store the output in memory, and skip the computation on subsequent reruns unless the inputs change
st.subheader("Data Loading & Caching")

# Live Demo: Caching
#@st.cache_data # try with and without this decorator
def load_massive_dataset():
    # Simulating a slow data load (e.g., parsing a large TSV)
    time.sleep(3) 
    return pd.DataFrame({"Gene": ["BRCA1", "TP53", "EGFR"], "Expression": [12.5, 8.2, 15.1]})

# This will take 3 seconds the first time, and 0 seconds on reruns
df = load_massive_dataset()
st.dataframe(df)