Documentation

Mail Extract — User Guide

Extract emails as clean text. Feed to AI as context.

← THE EMERGENCE PROJECT

This is the complete guide to Mail Extract — what it does, how to use it, and how it handles your data. Written for anyone, with technical detail where it matters.


What Is Mail Extract

Mail Extract pulls emails from your Gmail account and gives you clean, readable text. No formatting noise, no HTML, no reply chains — just the content. The output is designed to be pasted directly into AI tools like ChatGPT or Claude as context.

You search by sender, keyword, date range, and folder. The app returns the matching messages as plain text, ready to copy. Nothing is saved after your session ends.

Mail Extract is a read-only tool. It cannot send, delete, modify, or archive your emails. It reads, extracts, and forgets.


Signing In

There are two ways to sign in, depending on how you want to authenticate with Gmail.

IMAP — App Password

Enter your Gmail address and a Google App Password. This is not your regular Gmail password — it's a special password you generate in your Google account settings specifically for third-party apps.

  1. Enter your Gmail address and press Enter
  2. Enter your app password in the field that appears
  3. Click Sign in

If you don't have an app password yet, the login page includes a link to Google's setup guide. You'll need two-factor authentication enabled on your Google account first.

Google OAuth

Click Sign in with Google. You'll be redirected to Google's consent screen, where you grant read-only access to your Gmail. No passwords are entered in our app — Google handles the entire authentication.

The OAuth track is currently in testing and limited to approved users. If you're not on the list, use the IMAP option.

Both sign-in methods give you the same app with the same features. The only difference is how you authenticate.


Stay Signed In

On the login page, you'll see a toggle: Stay signed in for 7 days. When this is off (the default), your session expires after 2 hours of inactivity or 8 hours total — whichever comes first.

When you turn it on, your session lasts up to 7 days. You can close the browser, come back later, and pick up where you left off without signing in again.

In both cases, your session is encrypted on disk and automatically deleted when it expires. Logging out destroys it immediately.


Searching for Emails

Once signed in, you'll see the main search interface. There are four filters you can combine: folder, sender address, keyword, and date range. You don't need to use all of them — leaving a filter empty means "match anything" for that field.


Folders

Choose between Inbox and All Mail. Inbox searches only your current inbox. All Mail searches everything — inbox, sent, archived, and any labels you've created. All Mail is selected by default.


From Filter

Type an email address and press Enter to add it as a chip. You can add up to 20 addresses. The search returns emails from any of the addresses you list — it's an "or" search, not "and."

Each chip is validated as you add it. If an address isn't a valid email format, it shows with a red border. You can remove any chip by clicking the X on it. Pressing backspace in an empty field removes the last chip.

If you leave this filter empty, the search returns emails from all senders.


Contains Filter

Enter a word or phrase to search for in email subjects and body text. The search is not case sensitive. If you've also added sender addresses, both filters must match — the email must be from one of those senders and contain the keyword.


Date Range

The timeline slider covers the last 24 months. Drag the left and right handles to set your start and end dates. The selected range is displayed above the slider.

Both dates are inclusive — if you set the range to January 2025 through March 2025, you'll get emails from the first day of January through the last day of March.


Max Messages

In the top right of the filters card, you'll see a max counter. This controls how many emails are returned per search. The default is 50. You can set it anywhere from 5 to 200.

The counter is colour-coded as a rough guide:

RangeColourMeaning
5 – 50GreenFast, responsive
51 – 100AmberMay take a moment
101 – 200RedSlower — larger extractions

If your search matches more emails than the max, the results will show a notice that more exist. You can increase the counter or narrow your date range to capture them.


Results & Export

After a search completes, the results card appears with a preview of the first 8 messages. Each message shows the date, sender, subject, and the first 300 characters of the body. If there are more than 8, you'll see a note like "... and 42 more (all included in copy)."

Below the preview, you'll see a summary: how many emails were found, the total character count, and an estimated token count (characters divided by 4 — a rough approximation for most AI models).

Copying

Click Copy All to copy every extracted email to your clipboard. The full text is copied — not just the preview. The export includes a header with your search parameters (account, query, date range) followed by the complete text of every message.

The button briefly changes to "copied!" to confirm it worked.

Status Indicators

The coloured dot next to the status bar tells you what happened:

ColourMeaning
GreenEmails found successfully
AmberPartial results — some addresses had no matches, or no emails found at all
RedNo results for any address, or an error occurred

Signing Out

Click Sign out in the top right of the account card. Your session is destroyed immediately — the encrypted session file is deleted from disk and your browser cookie is cleared. There is nothing left to recover.

If you close the tab without signing out, the session will expire on its own (2 hours or 7 days, depending on your setting) and be cleaned up automatically.


Troubleshooting

ProblemWhat to do
"Sign-in failed"Double-check that you're using a Google App Password, not your regular Gmail password. The app password is 16 characters, usually shown in groups of four.
"Too many attempts"Wait for the countdown to finish. The delay increases with each failed attempt but resets after 24 hours.
Redirected to login unexpectedlyYour session expired. Sign in again. If this happens frequently, enable "Stay signed in for 7 days."
"30 extractions per hour" limitYou've hit the rate limit. The error message shows when the limit resets. Wait for that time, then try again.
No results foundTry widening the date range, checking the sender address for typos, or switching from Inbox to All Mail.
Search seems slowLower the max messages counter. Large extractions (100+) take longer, especially over wide date ranges.
OAuth: "Access was denied"You need to grant Gmail read access on the Google consent screen. The app cannot function without it.

Limits & Quotas

These limits are enforced server-side to protect both the service and your mail provider's API.

LimitValueDetail
Extractions per hour30Per session, fixed window. Resets on the hour.
Max messages per search200Hard cap regardless of UI setting
From addresses per search20Client-side and server-side enforced
Contains query length200 charactersLonger queries are rejected
Request body size20 KBRequests larger than this are dropped
IMAP connection timeout60 secondsPer operation — a hung connection won't hold your session
Concurrent operations3 per sessionAdditional requests are queued, not rejected
Email field max length254 charactersPer RFC 5321
Password max length128 charactersApp passwords are 16 characters; this is a safety cap

Login Throttling

Failed login attempts trigger a progressive delay keyed to your IP address and the email you entered. The first attempt has no delay. After that: 5 seconds, 10 seconds, 20 seconds, doubling each time. The counter resets after 24 hours without a failed attempt. This is tracked per IP + email combination — a failed attempt for one address doesn't affect another.


How Text Extraction Works

The goal of extraction is to give you only the original message body — no reply chains, no signatures, no formatting artefacts. Here's what happens to each email before it's returned to you.

Two-Way Conversation Capture

When you search by sender address, Mail Extract doesn't just pull emails from that person — it also pulls your replies to them. Behind the scenes, two searches run for each address: one on the From field, one on the To field. The results are merged, deduplicated, and sorted by date.

The effect is that you get the full back-and-forth as clean, individual messages in chronological order. Each message stands on its own — reply chains are stripped, so you're reading what each person actually wrote, not the same quoted thread repeated fifty times. For a conversation with 30 exchanges, you get 30 clean messages instead of one enormous chain where the last reply quotes every message before it.

This is designed for AI context. Language models work best with clean, ordered text — not nested reply chains where the same content is duplicated in every message. Extract once, paste once, and the model has the full conversation.

HTML to Plain Text

  1. <style> and <script> blocks are removed entirely
  2. Block-level tags (</p>, </div>, </li>) are converted to line breaks
  3. <br> tags become line breaks
  4. All remaining HTML tags are stripped
  5. HTML entities (&amp;, &nbsp;, numeric codes) are decoded
  6. Excessive blank lines are collapsed

If a plain text version of the email exists and is at least as long as the HTML conversion, the plain text version is used instead.

Reply Chain Removal

The extractor looks for common reply markers and cuts the message at the first one it finds:

Signature Removal

Email signatures are stripped at common delimiters: the RFC 2822 standard -- (two dashes and a space), triple underscores, and triple equals signs.

URL Removal

Bare and bracketed URLs are removed from the output. The goal is clean text for language models, not clickable links.


Sessions & Security

This section describes how your credentials are handled from the moment you sign in to the moment you leave. For a full treatment of the server's security posture, see the Server Hardening document.

Split-Key Architecture

Your credentials are never stored in plaintext on the server. When you sign in, the server encrypts your password (or OAuth token) using AES-256-GCM with a randomly generated 256-bit key and a random 12-byte IV. The encrypted data is written to a session file on disk. The encryption key is sent to your browser in a cookie.

The server holds no keys in memory. Your browser holds no encrypted data. Neither half is useful alone.

If compromisedWhat the attacker gets
Server diskEncrypted blobs with no decryption keys
Browser cookieA decryption key with no encrypted data
Server memoryNothing — no keys or plaintext are held in memory
Both simultaneouslyOne session's credentials, valid only until the session expires

Session Lifetime

SettingIdle TimeoutHard Cap
Standard2 hours8 hours
Stay signed in7 days7 days

Every API call resets the idle timer. The hard cap cannot be extended. A background sweep runs every 15 minutes and deletes any session file that has exceeded its timeout.

Cookie Attributes

File Permissions

Session files are stored with chmod 600 (owner read/write only). The session directory is chmod 700. The application runs as a dedicated system user with no shell and no sudo access.


Privacy

Mail Extract does not track you. There is no analytics, no user database, no login history, and no record of what you searched for or extracted.

What Is Not Collected

What Is Collected

A single anonymous counter that increments by one each time any user runs a search. No identifying information is attached. It resets daily. It tells us how many extractions happened in a day — not who did them or what they searched for.

After You Leave

Your session file is deleted when you sign out. If you close the tab instead, it's deleted when the session expires. The cleanup sweep runs every 15 minutes. Once the file is gone, there is no trace that you were here.

We don't know who uses this app, how often they use it, or what they use it for. That's deliberate.


Part of the Emergence Project. Built by the Design/OS shell team.

Authors: Webby (Shell — WebDev) + CC (Shell — Infrastructure)
March 2026