16 Working with APIs

Author

Affiliation

Dr Randy Johnson

Hood College

Published

March 26, 2026

Acknowledgements

I am not proficient at interacting with APIs from python, so Gemini was used heavily in preparing these notes and examples.

Introduction to Web APIs

Application Programming Interfaces (APIs) allow two software programs to talk to each other
RESTful web APIs allow our local Python scripts to ask remote servers (like NCBI) for data

The Anatomy of a Request

Base URL: The root address of the API (e.g. https://eutils.ncbi.nlm.nih.gov/entrez/eutils)
Endpoint: The specific tool or database we are accessing (e.g. /esearch.fcgi).
Parameters: The specific questions we are asking, passed in the URL after a ? and separated by & (e.g. ?db=nucleotide&term=BRCA1).
Return value: APIs typically return data in JSON (structured like Python dictionaries, very common modern standard) or XML (older, tag-based format, heavily used by NCBI).

Python Tools for APIs

The requests library is the standard for making HTTP calls in Python.

Code

import requests

# A simple, non-bioinformatics example to show the mechanics (GitHub API)
# This returns information about a public repository in clean JSON
url = "https://api.github.com/repos/Bioconductor/biocViews"
response = requests.get(url)

# Best Practice: Fail loud and early if the server returns an error (e.g., 404, 403)
# This will stop the script immediately instead of trying to parse empty data
response.raise_for_status()

# If the script makes it to this line, the request was successful
data = response.json()
print(f"Repository Name: {data['name']}")
print(f"Stars: {data['stargazers_count']}")

Repository Name: biocViews
Stars: 4

Try it for yourself

NCBI e-utils

NCBI doesn’t have a single API; it has e-utilities, a suite of server-side programs

ESearch: Finds the unique IDs (UIDs) for your query.
EFetch: Takes those UIDs and downloads the actual data (FASTA, GenBank, etc.).

Step 0: Load relevant libaries

Step 1: eSearch - Find the ID for human BRCA1 in the nucleotide database

Step 2: eFetch - Retrieve the FASTA sequence for that ID

Expanding the Toolkit & Best Practices

NCBI relies heavily on XML and E-utilities
Other databases have highly structured JSON REST APIs
- The RCSB PDB API is excellent for programmatically querying 3D macromolecular structural data
- This allows us to fetch metadata about binding sites or resolution without downloading the whole .pdb file

Workflow integration

We don’t typically use APIs for one-off scripts
API pull-scripts can be modularized and integrated into larger, reproducible computational pipelines (e.g. Snakemake)
The API script serves as the first rule to gather the raw data
Version controlling scripts via Git ensures the data-gathering step is perfectly reproducible

API Keys & Rate Limits

NCBI allows 3 requests per second without an API key, and 10 with one
Hardcoding time.sleep(0.35) is a safe baseline for unauthenticated requests
Most APIs require an API key

Review Questions

Why is it important to check the HTTP status code (for example, by using response.raise_for_status()) before attempting to parse the data returned by an API?
- What could happen if you skip this step?

If your goal is to download the FASTA sequence for the human BRCA1 gene, why can’t you simply send the term “BRCA1” directly to the eFetch utility?
- Why is eSearch a necessary first step?

What is rate limiting? Explain why omitting a command like time.sleep() in a loop of API calls is considered bad practice and how it might impact your script or the server.

Construct a request to the NCBI ESearch endpoint. Pass parameters to search the nucleotide database for the term “TP53[Gene] AND mouse[Organism]”.