16  Working with APIs

Author
Affiliation

Dr Randy Johnson

Hood College

Published

March 26, 2026

Acknowledgements

I am not proficient at interacting with APIs from python, so Gemini was used heavily in preparing these notes and examples.

Introduction to Web APIs

  • Application Programming Interfaces (APIs) allow two software programs to talk to each other
  • RESTful web APIs allow our local Python scripts to ask remote servers (like NCBI) for data

The Anatomy of a Request

  • Base URL: The root address of the API (e.g. https://eutils.ncbi.nlm.nih.gov/entrez/eutils)
  • Endpoint: The specific tool or database we are accessing (e.g. /esearch.fcgi).
  • Parameters: The specific questions we are asking, passed in the URL after a ? and separated by & (e.g. ?db=nucleotide&term=BRCA1).
  • Return value: APIs typically return data in JSON (structured like Python dictionaries, very common modern standard) or XML (older, tag-based format, heavily used by NCBI).

Python Tools for APIs

The requests library is the standard for making HTTP calls in Python.

Code
import requests

# A simple, non-bioinformatics example to show the mechanics (GitHub API)
# This returns information about a public repository in clean JSON
url = "https://api.github.com/repos/Bioconductor/biocViews"
response = requests.get(url)

# Best Practice: Fail loud and early if the server returns an error (e.g., 404, 403)
# This will stop the script immediately instead of trying to parse empty data
response.raise_for_status()

# If the script makes it to this line, the request was successful
data = response.json()
print(f"Repository Name: {data['name']}")
print(f"Stars: {data['stargazers_count']}")
Repository Name: biocViews
Stars: 4

Try it for yourself

NCBI e-utils

NCBI doesn’t have a single API; it has e-utilities, a suite of server-side programs

  • ESearch: Finds the unique IDs (UIDs) for your query.
  • EFetch: Takes those UIDs and downloads the actual data (FASTA, GenBank, etc.).

Step 0: Load relevant libaries

Step 1: eSearch - Find the ID for human BRCA1 in the nucleotide database

Step 2: eFetch - Retrieve the FASTA sequence for that ID

Expanding the Toolkit & Best Practices

  • NCBI relies heavily on XML and E-utilities
  • Other databases have highly structured JSON REST APIs
    • The RCSB PDB API is excellent for programmatically querying 3D macromolecular structural data
    • This allows us to fetch metadata about binding sites or resolution without downloading the whole .pdb file

Workflow integration

  • We don’t typically use APIs for one-off scripts
  • API pull-scripts can be modularized and integrated into larger, reproducible computational pipelines (e.g. Snakemake)
  • The API script serves as the first rule to gather the raw data
  • Version controlling scripts via Git ensures the data-gathering step is perfectly reproducible

API Keys & Rate Limits

  • NCBI allows 3 requests per second without an API key, and 10 with one
  • Hardcoding time.sleep(0.35) is a safe baseline for unauthenticated requests
  • Most APIs require an API key

Review Questions

  • Why is it important to check the HTTP status code (for example, by using response.raise_for_status()) before attempting to parse the data returned by an API?
    • What could happen if you skip this step?
  • If your goal is to download the FASTA sequence for the human BRCA1 gene, why can’t you simply send the term “BRCA1” directly to the eFetch utility?
    • Why is eSearch a necessary first step?
  • What is rate limiting? Explain why omitting a command like time.sleep() in a loop of API calls is considered bad practice and how it might impact your script or the server.
  • Construct a request to the NCBI ESearch endpoint. Pass parameters to search the nucleotide database for the term “TP53[Gene] AND mouse[Organism]”.