cleanbox: a boredom-driven Gmail cleanup tool, and how it actually works

Let me be upfront about something: this is not a unique project. Gmail cleanup tools exist. Inbox zero scripts exist. There are browser extensions, paid services, and entire products dedicated to this exact problem. So why did I build one?

Because it was a slow day at work, my brain had checked out, with a few more days left before weekend and I still wanted to feel like I was doing something. You know those days where you can't bring yourself to do actual work, but sitting completely idle feels worse? This was one of those days. And so cleanbox was born out of boredom, mild inbox anxiety, endless procrastination, and a preference for tools I understand or wanted to understand.

The result is a small Python CLI (a command-line tool no graphical interface, just something you run in a terminal) that scans your Gmail inbox using only message headers, scores each message for bulk/newsletter signals, and generates a CSV of cleanup candidates. This post is about how it actually works under the hood the code, the data structures, and the bugs that took longer to find than they should have.

You can find it on GitHub: vilasinits/cleanbox.

First: what even is an email header?

Every email has two parts: a body (the actual content you read) and headers (metadata that travels with the message). You never see headers in your Gmail inbox they're hidden but they contain things like who sent the message, when, how it was routed through mail servers, and crucially, whether it was sent as part of a bulk mailing list.

Think of headers like the outside of a physical envelope: the destination address, the return address, the postmarks. You don't need to open the letter to know a lot about it. That's the entire premise of cleanbox we never open the letter.

The six headers cleanbox looks at are:

METADATA_HEADERS = ["From", "Subject", "Date", "List-Unsubscribe", "Precedence", "List-Id"]

Most of these are self-explanatory. List-Unsubscribe is the interesting one. Any mailing list or newsletter that follows email conventions is supposed to include this header so that mail clients can offer a one-click unsubscribe button. If it's present, you're almost certainly looking at bulk mail rather than a personal message. Precedence: bulk is a similar signal it's a way for senders to tell mail servers "this is a mass mailing, not a one-on-one email."

The privacy win here is significant: using only headers means the tool never requests or reads message bodies. Not a single word of your actual email content passes through it.

The dependency: gws

To talk to Gmail, cleanbox uses gws the official Google Workspace CLI. It's a command-line tool built by Google (written in Node.js, which is a JavaScript runtime I honestly don't know its internals well, but what it does is handle all the authentication complexity and give you a simple way to call Gmail's API from the terminal).

The reason I reached for gws rather than calling Gmail's API directly is that getting OAuth working from scratch (OAuth is the system Google uses to let third-party apps access your account securely, without giving them your password) is genuinely tedious to set up in Python. gws takes care of all of that. cleanbox just calls gws as a subprocess meaning it runs gws as a separate program and reads what it prints back.

Setting up gws auth (the fiddly part)

This is the one part of the setup that requires some patience. You need a Google Cloud project Google's platform for building things that talk to Google services with the Gmail API enabled, and an OAuth client configured. Here's what that actually means in practice:

1. Install gws (you'll need Node.js installed first nodejs.org has installers for every OS):

npm install -g @googleworkspace/cli

2. Go to console.cloud.google.com, create a project, and enable the Gmail API inside it. Think of this as registering your script as an "app" that's allowed to talk to Gmail.

3. Create an OAuth client (under APIs & Services → Credentials), choose Desktop app as the type, and download the credentials JSON file it gives you. This file is basically a key that proves to Google that your script is the registered app.

4. Put that file where gws can find it:

mkdir -p ~/.config/gws
cp ~/Downloads/client_secret_*.json ~/.config/gws/client_secret.json

5. Authenticate. The --scopes part tells Google what permissions you're asking for in this case, the ability to read and modify your Gmail:

gws auth login --scopes https://www.googleapis.com/auth/gmail.modify

Your browser will open and show an "unverified app" warning. That's expected it happens because your app is in "testing mode," which is fine for personal use. Click through it. Once that's done, verify it worked:

cleanbox test-auth

If you see your Gmail labels listed, you're in.

How the code talks to gws

Every Gmail operation in cleanbox goes through a single function called run_gws. It takes a list of command-line arguments, runs gws as a subprocess, and returns whatever JSON gws prints back as a Python dictionary (a data structure of key-value pairs):

def run_gws(args: list[str]) -> dict:
    try:
        completed = subprocess.run(
            ["gws", *args],
            check=True,
            text=True,
            capture_output=True,
        )
    except FileNotFoundError as exc:
        raise SystemExit(
            "Could not find `gws` in PATH. Install @googleworkspace/cli first."
        ) from exc
    except subprocess.CalledProcessError as exc:
        stderr = exc.stderr.strip() or exc.stdout.strip()
        raise SystemExit(f"gws command failed:\n{stderr}") from exc

    stdout = completed.stdout.strip()
    if not stdout:
        return {}
    result = json.loads(stdout)

    # gws exits with code 0 even on Gmail API errors,
    # and prints the error as JSON to stdout instead of raising
    if "error" in result:
        err = result["error"]
        msg = err.get("message", str(err))
        code = err.get("code", "?")
        raise SystemExit(f"Gmail API error {code}: {msg}")

    return result

The non-obvious part is that last check. gws has a quirk: even when the Gmail API returns an error (like a 400 "bad request"), gws still exits with a success code and prints the error as a JSON object to its output. Without explicitly checking for "error" in the result, a failed call would silently return an empty dictionary and cleanbox would just say "no messages found" which is exactly what happened to me while debugging, and was confusing for longer than I'd like to admit.

Step 1: getting a list of message IDs

The first thing cleanbox does is ask Gmail for a list of message IDs matching your search query. Note: just IDs not the messages themselves. Think of this like asking the library for a list of book catalogue numbers that match a topic, before you go and actually pull books off the shelves.

def get_message_ids(query: str, max_results: int) -> list[str]:
    PAGE_MAX = 500
    ids: list[str] = []
    next_token: Optional[str] = None

    while len(ids) < max_results:
        batch = min(max_results - len(ids), PAGE_MAX)
        payload: dict = {"userId": "me", "q": query, "maxResults": batch}
        if next_token:
            payload["pageToken"] = next_token

        data = run_gws(["gmail", "users", "messages", "list", "--params", json.dumps(payload)])
        messages = data.get("messages", [])
        ids.extend(m["id"] for m in messages if "id" in m)
        next_token = data.get("nextPageToken")

        if not next_token or not messages:
            break

    return ids[:max_results]

The Gmail API will give you at most 500 IDs in a single request. If you ask for more, it gives you a nextPageToken a cursor that tells the API "give me the next batch starting from here." The while loop keeps fetching batches until we have as many as requested or there are no more results.

The raw response from Gmail looks like this really just IDs, nothing else:

{
  "messages": [
    {"id": "19d2a31eae3a2769", "threadId": "19d2a31eae3a2769"},
    {"id": "19d2a2e3ffe0e3cc", "threadId": "19d2a2e3ffe0e3cc"}
  ],
  "nextPageToken": "13571342265388962228",
  "resultSizeEstimate": 201
}

Step 2: fetching the headers for each message

Now for each ID, we make a second call to get the actual headers. The Gmail API has a format=metadata option which tells it: don't send me the message body, just the metadata. And metadataHeaders narrows it down further to only the specific headers we care about:

def get_message_metadata(message_id: str) -> dict:
    payload = {
        "userId": "me",
        "id": message_id,
        "format": "metadata",
        "metadataHeaders": METADATA_HEADERS,
    }
    return run_gws(["gmail", "users", "messages", "get", "--params", json.dumps(payload)])

The response comes back as a nested JSON object. Here's what a typical newsletter email looks like when you ask Gmail for its metadata:

{
  "id": "19d2a31eae3a2769",
  "labelIds": ["CATEGORY_PROMOTIONS", "INBOX"],
  "payload": {
    "headers": [
      {"name": "From",             "value": "newsletter@someservice.com"},
      {"name": "Subject",          "value": "Your weekly digest"},
      {"name": "Date",             "value": "Mon, 10 Mar 2025 08:00:00 +0000"},
      {"name": "List-Unsubscribe", "value": "<https://someservice.com/unsubscribe>"},
      {"name": "Precedence",       "value": "bulk"}
    ]
  }
}

Two things to notice: the labelIds (Gmail's own category labels like "Promotions") sit at the top level of the response, while the actual email headers are nested inside payload → headers as a list of name/value pairs. To pull a single header value out of that list, there's a small helper function:

def extract_header(message: dict, name: str) -> str:
    headers = message.get("payload", {}).get("headers", [])
    for header in headers:
        if header.get("name") == name:
            return header.get("value", "")
    return ""

It walks through the list looking for a header whose name matches, and returns its value. If the header doesn't exist in this particular message (not all emails have List-Unsubscribe, for instance), it returns an empty string rather than crashing. This keeps the scoring code clean it always gets a string, never a null.

Step 3: scoring the message

This is where the actual decision-making happens. Each message gets a numerical score based on how many bulk/newsletter signals it carries. The higher the score, the more likely it's something you don't need in your inbox:

def score_message(message_id: str, message: dict, cfg: AppConfig) -> Candidate:
    sender  = extract_header(message, "From")
    subject = extract_header(message, "Subject")
    date    = extract_header(message, "Date")
    labels  = "|".join(message.get("labelIds", []))

    score = 0
    reasons: list[str] = []
    sender_lower = sender.lower()

    has_list_unsub      = bool(extract_header(message, "List-Unsubscribe"))
    precedence          = extract_header(message, "Precedence").lower()
    has_precedence_bulk = bool(re.search(r"bulk|list|junk", precedence))
    has_bulk_sender     = bool(re.search(cfg.bulk_sender_pattern, sender_lower))
    has_no_reply        = bool(re.search(cfg.noreply_pattern, sender_lower))
    is_promo_cat        = "CATEGORY_PROMOTIONS" in labels
    is_social_cat       = "CATEGORY_SOCIAL"     in labels
    is_forum_cat        = "CATEGORY_FORUMS"     in labels

    if has_list_unsub:      score += 2; reasons.append("list-unsubscribe")
    if has_precedence_bulk: score += 2; reasons.append("precedence-bulk")
    if has_bulk_sender:     score += 2; reasons.append("bulk-sender-pattern")
    if has_no_reply:        score += 1; reasons.append("noreply")
    if is_promo_cat:        score += 2; reasons.append("category-promotions")
    if is_social_cat:       score += 1; reasons.append("category-social")
    if is_forum_cat:        score += 1; reasons.append("category-forums")

    return Candidate(
        message_id=message_id,
        score=score,
        sender=sender, subject=subject, date=date,
        labels=labels,
        reasons="|".join(reasons),
    )

The re.search calls are regex pattern matches (regex, or regular expressions, is a way of describing text patterns think of it like a very powerful "find" function). The bulk sender pattern looks for common strings in the From address that are typical of automated mail:

bulk_sender_pattern = r"no-?reply|noreply|donotreply|newsletter|updates|notification|notifications|mailer-daemon"

There are actually two separate sender checks. One catches any bulk-style address (+2 points), and a narrower one specifically catches noreply variants (+1 extra point). So an email from noreply@someservice.com picks up both: +2 for the bulk sender pattern, +1 for the noreply check specifically. That's 3 points just from the sender address. Add List-Unsubscribe (+2) and CATEGORY_PROMOTIONS (+2) and you're at 7 well above the default threshold of 4.

The reasons list records which signals fired for each message. This gets saved to the CSV alongside the score, which is what powers the stats command you can later see which signals are triggering most often across your whole inbox without making any more API calls.

All the scoring weights live in a dataclass (a lightweight Python data container) called AppConfig, and every single one of them is overridable via a config file at ~/.config/cleanbox/config.toml if you want to tune the sensitivity:

@dataclass
class AppConfig:
    score_list_unsubscribe:    int = 2
    score_precedence_bulk:     int = 2
    score_bulk_sender:         int = 2
    score_noreply:             int = 1
    score_category_promotions: int = 2
    score_category_social:     int = 1
    score_category_forums:     int = 1

Step 4: putting it all together

The main scan pipeline calls each of these in sequence, one message at a time, with a progress bar so you can see it moving:

def scan(query: str, max_results: int, threshold: int, cfg: AppConfig) -> list[Candidate]:
    message_ids = get_message_ids(query, max_results)
    candidates: list[Candidate] = []
    start = time.monotonic()

    for i, message_id in enumerate(message_ids, 1):
        message = get_message_metadata(message_id)
        candidate = score_message(message_id, message, cfg)
        if candidate.score >= threshold:
            candidates.append(candidate)
        _draw_progress(i, total, len(candidates), start)

    return candidates

One API call to get the list of IDs, then one API call per message to get its headers. For 200 messages that's 201 separate calls. It's not instant each one spawns a gws subprocess but it's steady, and the progress bar shows you the rate in messages per second alongside a live candidate count and estimated time remaining.

The output: a Candidate and a CSV

Each message that scores above the threshold becomes a Candidate a simple data container holding everything we fetched:

@dataclass
class Candidate:
    message_id: str   # Gmail's unique ID for this message
    score:      int   # total points accumulated
    sender:     str   # the From header
    subject:    str   # the Subject header
    date:       str   # the Date header
    labels:     str   # pipe-separated Gmail labels
    reasons:    str   # pipe-separated list of signals that fired

These get written to a CSV file one row per candidate. The most important column is message_id, because that's what every subsequent action (archive, trash, undo) reads back to know which messages to actually touch. The workflow is deliberately split in two: scan first, write the CSV, review it, then act on it separately. This means you can delete rows from the CSV before archiving, and you never have to rescan just to apply an action.

A bug that wasted more of my time than it should have

has:list-unsubscribe works as a search filter in the Gmail web interface. Type it into the Gmail search bar and it finds newsletters just fine. But when you pass it to the Gmail REST API as a query parameter, it silently returns zero results no error, just {"resultSizeEstimate": 0}, as if your inbox is completely empty.

I had this operator in my example queries for a while and kept scratching my head at why I was getting no results. Eventually I tested the raw gws command directly in the terminal and saw what the API was actually returning. The fix: remove it from queries entirely. The tool already detects the List-Unsubscribe header at scoring time anyway, when it fetches individual message metadata. The Gmail search query is just a broad pre-filter; the real work happens in score_message().

Using it

Install it:

git clone https://github.com/vilasinits/cleanbox.git
cd cleanbox
pip install -e .

The recommended flow scan first, review, then act:

# scan and write a CSV nothing in Gmail is touched
cleanbox scan \
  --query 'in:inbox older_than:180d category:promotions -is:starred' \
  --max-results 200 --threshold 4 --output csv/promotions.csv

# see what the scoring found, no API calls needed
cleanbox stats --input csv/promotions.csv

# archive everything in that CSV
cleanbox archive-csv --input csv/promotions.csv --apply

# changed your mind?
cleanbox undo --input csv/promotions.csv --apply

The full source is a single Python file at src/gmail_header_cleaner/cli.py about 500 lines if you want to read the whole thing.

Is it useful?

Genuinely, yes. I used it to clear out a few hundreds of old promotional emails (and my boredom kicked in again...sigh) and it felt like cleaning out a drawer that had been quietly bothering me for years. The CSV-first approach is what makes it feel safe rather than reckless being able to scroll through what's about to be archived, and delete rows you want to keep, before anything actually changes.

Would I recommend it to someone comfortable in a terminal who'd rather not hand their inbox to a third-party service? Yes. Would I recommend it to someone who just wants inbox zero with zero friction? Probably not, the gws and Google Cloud setup alone will lose most people before they get to the good part.

But it was a boredom project, and it scratched the itch. And now it lives on GitHub in case it's useful to someone else with a slow afternoon and a messy inbox and have ideas on how it can be improved or optimised or feel like adding more features!

Tools used

gws Google Workspace CLI (Node.js), handles OAuth authentication and wraps the Gmail API into shell commands
Gmail API specifically the users.messages.get endpoint with format=metadata, which returns headers without the message body
Python 3.9+, standard library only no third-party packages required
cleanbox on GitHub