Privacy-First AI: What to Look for in 2026

On this page+

1. Is the model trained on user data?
2. Where does the data actually live, and who can decrypt it?
3. What is logged, and for how long?
4. Can you export everything, in a format you can actually use?
5. Can you delete everything, and does delete actually delete?
6. Does the product work offline, and what happens when the network fails?
7. Who is the company accountable to?
The shape of a credible privacy story

Every consumer AI product launched since 2023 has used "privacy" as a marketing word. Most of them have not earned it. The architecture has not changed — your data still flows to a central server, gets logged, gets sampled for model training, and is governed by a privacy policy you did not read and that the company can update unilaterally.

This is a buyer's guide. Seven concrete questions to ask any AI product before you hand it a document, a voice sample, or a photograph. Each question has a right answer and several wrong ones. We list both. YeongSil's answers appear at the end of each section — not as a sales pitch, but because we wrote the questions thinking about what a personal AI device should commit to.

1. Is the model trained on user data?

The single highest-stakes question. If the answer is yes — even "anonymised" — your data is in the training set, and any future model trained on top of it inherits whatever was extracted from your documents. The right answer is no, with a contractual commitment, not a setting.

Wrong answers to watch for: "we anonymise before training" (anonymisation of natural-language data is not solved; researchers have repeatedly extracted user-identifying information from supposedly anonymised LLM training sets), "you can opt out" (the default is on, which is the privacy posture), "we only train on data you explicitly share with us" (which then turns out to include everything you type into the product).

YeongSil: No user document, voice clip, or image is ever used to train any model. The architecture (RAG, covered in [our RAG explainer](/blog/rag-vs-fine-tuning-personal-ai)) does not require user data for training, which is what makes the commitment durable rather than a policy that can change with a press release.

2. Where does the data actually live, and who can decrypt it?

A product can claim "encrypted" while still being trivially decryptable by the company. The questions that matter: is the encryption at rest with a key the company holds, or with a key derived from the user? Who has key access — the storage layer, or every engineer with production database access? What is the audit trail for decryption?

The right answers are per-user key derivation, narrow key access (storage-layer only), and full audit logging of any decryption event.

Wrong answers: "we use industry-standard encryption" (this tells you nothing; the question is about key custody), "your data is stored securely in the cloud" (where, by whom, with what key, under whose jurisdiction).

YeongSil: Documents are encrypted at rest with AES-256, with a per-account key. Decryption happens only inside the retrieval pipeline at query time. No engineer has standing access to the decryption keys; production access is gated by hardware security module and logged per event.

3. What is logged, and for how long?

Most AI products log every prompt and response. The defensible reason is debugging and abuse prevention; the actual outcome is a permanent searchable index of every conversation. The questions: what is logged, how long is it retained, who can search it, and is there a way to opt out without losing the product?

The right answer is short retention (days to weeks, not years), redaction of personal content from operational logs, and a hard delete option that propagates through backups within a defined window.

Wrong answers: "we retain logs for service improvement" (indefinitely), "logs are accessible to authorised personnel" (everyone with a corporate badge).

YeongSil: Conversations are not logged by default. Operational metadata (errors, performance metrics, no content) is retained for 30 days. If you opt in to share a specific conversation for debugging, that conversation is retained for the period you specify and deleted automatically afterwards.

4. Can you export everything, in a format you can actually use?

Export is the proof of data ownership. If you cannot leave with your data in a usable format, you do not own it — the company does, and is granting you read access. The right answer is a one-click export that produces your documents in their original formats, plus a machine-readable archive of conversations and any derived data (embeddings, indexes).

Wrong answers: "you can request your data via support" (with no SLA), "we provide a JSON of your settings" (settings are not data), "data export is available on enterprise plans" (so the consumer plan is a hostage situation).

YeongSil: One-click export from the device produces a ZIP containing all original documents, a JSON of indexed metadata, an export of conversation history, and the raw embeddings. Format is open and documented. Export does not require contacting support.

5. Can you delete everything, and does delete actually delete?

Related but distinct. "Delete" can mean "marked as deleted in the UI but retained in backups for 90 days," or it can mean "purged from primary storage, derived caches, and backups within a defined window." The right answer is the second, with the window specified.

Wrong answers: "you can delete your account" (deletes the account record, not necessarily the content), "data may be retained for legal compliance" (which is sometimes legitimate and often a cover).

YeongSil: Delete operations purge the affected documents from the primary store immediately, from derived indexes within five minutes, and from backups within 30 days. The window is published and audited.

6. Does the product work offline, and what happens when the network fails?

This is a privacy question as much as a reliability question. A product that requires constant connectivity is, by definition, sending your data somewhere constantly. A product that can run locally for the core loop, and only reaches out for specific operations, has a much smaller attack surface and a much shorter data trail.

The right answer is: critical functions (wake-word detection, retrieval, recent context) run on-device; only the language-model call requires connectivity, and that call is over TLS with no logging beyond what is needed for the request.

Wrong answers: "the product requires an internet connection to function" (so everything you do is a network event), "we support an offline mode for limited features" (with no detail).

YeongSil: Wake-word detection is on-device. Document indexing is on-device. Retrieval is on-device. The only network call is the language-model call, which is over TLS, returns no logged conversation content, and degrades gracefully (queued requests, indicator light) if the network is unavailable.

7. Who is the company accountable to?

The final question is not technical. A product can be architecturally private and still operationally hostile if the company behind it is structured to monetise data later. The questions: who owns the company, what is the funding structure, are there contractual data-use commitments that survive a change of ownership?

The right answer is a company whose business model is the device itself (hardware revenue, optional subscription for compute), with binding commitments around data use that survive acquisition.

Wrong answers: "we will never sell your data" (until the acquirer changes the policy), "our privacy policy may be updated at any time" (so the commitment is not a commitment).

YeongSil: The business model is hardware revenue plus optional cloud compute subscription. The data-use commitments — no training, no sharing, encrypted at rest, exportable, deletable — are written into the terms of sale, which survive a change of ownership.

The shape of a credible privacy story

A defensible privacy posture is not a paragraph in a marketing page. It is an architecture (RAG instead of fine-tuning), a key custody model (per-user, narrow access), a logging discipline (short retention, content-free metadata), an export-and-delete contract (one click, defined SLA), an offline-graceful design (local first, network second), and a corporate structure aligned with the user rather than against them.

The category of products that satisfies all seven questions today is small. We think it should be the floor for any product that asks you to hand it your documents, your voice, or your face. If you want a device built around that floor, [join the YeongSil waitlist](#waitlist) — and if you want the architectural argument in more depth, our post on [what makes AI personal](/blog/what-makes-ai-personal) covers the property set that makes this posture possible in the first place.

Sources & further reading

01Extracting Training Data from Large Language Models— Carlini et al., USENIX Security
02EU AI Act — official text— European Commission
03GDPR — right to erasure (Article 17)— European Commission
04NIST AI Risk Management Framework— NIST
05Cloud key management best practices— Google Cloud KMS docs

Be first to live with it.

Join 2,400+ people on the waitlist. Early members get 30% off launch price and priority shipping.

Join the waitlist →Read more on the blog →

Keep reading

Vision

Why Personal AI Finally Needs a Body

Chatbots forget you. Smart speakers can't see. The next leap in AI isn't a bigger model — it's a device that lives with you, sees what you see, and acts on your behalf.

Privacy

AI Privacy in 2026: Why On-Device Memory Matters

Cloud LLMs train on your prompts. Browser extensions leak your data. Here's how a device-first architecture changes the privacy equation for personal AI.