PII Privacy Consulting,
Built on Open Source

I’m the founder of Philterd, LLC, a consultancy specializing in PII privacy. I help clients design privacy-focused architectures for cloud and AI — and bring the engineering to build them.

Every engagement is backed by my own open source privacy toolkit — Philter, Arbiter, Phield, and others. That gives clients real leverage: auditable, portable, vendor-neutral PII protection platforms that scale with their cloud and AI workloads.

Portrait of Jeff Zemerick
Industries served
  • Healthcare
  • Finance
  • Government
  • E-commerce
  • Shipping & Logistics
  • Natural Resources
  • Data & AI
Partnerships
  • Amazon Web Services
  • Google Cloud
  • Microsoft Azure

From the Founder

Privacy got serious for me when cloud, AI, and natural language processing all collided in the same data pipelines. I started Philterd after a healthcare client’s PII pipeline failed in production and the commercial DLP tool couldn’t tell us why. After more than 15 years building NLP systems — much of it inside Apache OpenNLP — and 17 years on AWS and GCP, the lesson was the same: organizations need PII protection that is auditable, open, and theirs to inspect. The toolkit (Philter, Phileas, Arbiter, Phield) gives clients a vendor-neutral foundation, and the consulting work helps them assemble it into platforms they actually own. The goal isn’t to leave behind a proprietary platform you depend on me to maintain — it’s to leave you with infrastructure you understand and engineers who can extend it.

— Jeff Zemerick, Founder · Philterd, LLC

Governed Cloud Architecture

Resilient, cloud-native architectures on AWS and GCP where privacy is wired in from day one — federated multi-cloud designs, Well-Architected governance audits, automated PII discovery with AWS Macie and GCP Sensitive Data Protection, Infrastructure-as-Code guardrails, edge-level privacy interception via Lambda@Edge and CloudFront Functions, and end-to-end encryption.

Multi-Cloud Strategies

I design federated architectures that bridge AWS and GCP for high availability, disaster recovery, and freedom from vendor lock-in. Workloads stay portable while identity, networking, and data planes remain consistent across providers.

Well-Architected Governance

I audit environments against the AWS and GCP Well-Architected frameworks, evaluating security, reliability, cost, and operational maturity. The deliverable is a prioritized remediation roadmap that raises your posture without disrupting delivery.

Automated PII Discovery at Scale

I implement cloud-native discovery using AWS Macie and GCP Sensitive Data Protection (DLP) to continuously monitor S3 and Cloud Storage for unencrypted or misplaced PII. Organizations move from manual audits to real-time visibility and remediation.

Privacy-by-Design Infrastructure

I bridge DevOps and data privacy by embedding security guardrails directly into Infrastructure as Code. Service Control Policies, OPA rules, and automated IAM audits ensure your data environments are private by default, not by exception.

Edge-Level Privacy Interception

I deploy real-time privacy filters at the network perimeter using Lambda@Edge and CloudFront Functions. Telemetry and user queries are sanitized before reaching centralized data lakes, mitigating accidental PII ingestion and shrinking your compliance footprint.

Encryption & Key Management

I architect end-to-end encryption with AWS KMS, CloudHSM, and GCP Cloud KMS, including customer-managed keys, envelope encryption, and automated rotation. Field-level encryption keeps sensitive attributes protected at rest, in transit, and across multi-cloud boundaries.

Compliance Engineering

End-to-end data privacy for cloud and AI workloads — from data governance frameworks and NLP-driven PII/PHI discovery, through AI guardrails for generative and RAG pipelines, to mathematically provable differential privacy and utility-preserving anonymization. Built on the open source Philterd toolkit so the controls you ship are auditable, portable, and vendor-neutral.

Data Governance

I establish the data classification, retention, and stewardship frameworks that transform sensitive information from a compliance liability into a governed enterprise asset — aligned with GDPR, HIPAA, CCPA, and your industry’s regulatory baseline.

PII / PHI Discovery

Using advanced NLP and deep-learning models, I build automated PII and PHI detection across high-volume, unstructured text and document streams — powered by the Philterd toolkit so the entire pipeline stays auditable, tunable, and vendor-neutral.

AI Guardrails

I engineer secure gateways for generative AI and RAG pipelines that stop PII, PHI, and proprietary content from ever reaching the model layer — protecting both prompts and outputs against data exfiltration and prompt-injection attacks.

Local Differential Privacy

I implement Local Differential Privacy at the source, adding calibrated mathematical noise to vector embeddings and telemetry so individual records can’t be re-identified, preventing both membership-inference and vector-inversion attacks against your AI systems.

Differential Privacy

I move organizations from best-effort redaction to mathematically provable privacy, managing formal Privacy Budgets (ε) across queries and training pipelines to meet the strictest regulatory standards in healthcare, finance, and government workloads.

Utility-Preserving Anonymization

Using Masked Language Models, I perform context-aware entity replacement that preserves the semantic integrity of unstructured text — producing synthetic data twins that remain fully private yet effective for training LLMs and refining NLP models.

Industry Leadership

Shaping privacy and NLP at the standards level — leading the open source Philterd toolkit used in production privacy stacks, chairing Apache OpenNLP as PMC Chair and ASF Member, speaking at international conferences including OpenSearchCon, ApacheCon, and Berlin Buzzwords, and contributing as an AWS Community Builder and certification exam author.

Open Source Privacy Toolkit

Through Philterd, LLC I develop the open source stack behind every engagement — Philter, Phileas, Arbiter, Phield, and supporting tooling. The goal is simple: give the engineering community high-performance, vendor-neutral building blocks for PII protection across modern cloud and AI pipelines.

Global Technical Advocacy

I am a frequent speaker at international technology conferences, including OpenSearchCon, ApacheCon, and Berlin Buzzwords. I share architectural insights on the intersection of cloud infrastructure and data privacy, helping organizations navigate the evolving landscape of emerging technologies and regulatory requirements.

Apache Foundation Leadership

I serve as PMC Chair of Apache OpenNLP and as an Apache Software Foundation Member. Beyond shipping code, that means shaping release cadence, mentoring committers, and stewarding a project the broader Java NLP community has relied on for over a decade.

AWS Community Building

I’ve been recognized as an AWS Community Builder, authored questions for AWS certification exams, and published commercial products to the AWS Marketplace. Each role keeps me close to the engineers using the platform and to the standards that define what excellence looks like on AWS.

Strategic Engagements

Direct advisory and engineering services for organizations requiring elite-level cloud architecture and AI privacy. I provide the governance and implementation expertise needed to bridge the gap between innovation and compliance.

Deep Audit

A high-impact evaluation of your existing stack. I identify structural vulnerabilities in cloud configuration and data privacy, delivering a prioritized roadmap to remediate risk while maintaining operational velocity.

Fractional Leadership

Technical oversight for high-stakes transitions. As an embedded advisor, I provide the governance and architectural direction necessary to scale AI and cloud initiatives with absolute confidence.

Specialized Engineering

Custom solutions for mission-critical challenges. I design and deploy bespoke redaction layers and NLP pipelines for environments where off-the-shelf tools fail to meet scale or latency requirements.

Frequently Asked Questions

Quick answers to the questions most clients ask before our first call.

How long are typical engagements?

Anything from a one-week Deep Audit to multi-month embedded engagements. Most clients land in the 3–6 month range.

What engagement models do you offer?

Hourly advisory, fixed-scope audits, and fractional retainers (typically 1–2 days per week through Philterd, LLC).

Can you sign an NDA?

Yes — before any technical detail is shared. Mutual NDAs are standard and I’m happy to use yours or provide one.

Where are you based, and do you travel?

Greater Pittsburgh, USA. Most work is remote; on-site visits are available for kickoffs, workshops, and security reviews.

Do you write code, or just advise?

Both. I pair with your developers, contribute production-grade code to your repos, and ship features alongside the team — not just diagrams and decks.

What does a first call cover?

About 30 minutes to understand your stack, current privacy posture, and where the biggest risks or unknowns sit.

Free Resource · PDF

The RAG Privacy Checklist

12 questions to ask before you ship your AI app.

  • How prompts, retrieval context, and tool calls each leak PII — and where to stop them.
  • The minimum redaction surface every RAG pipeline needs.
  • Logging, prompt history, and vector stores: the “quiet” PII surfaces most teams miss.
  • A scoring rubric to gauge your current privacy posture in under 20 minutes.

I’ll email you the PDF and the occasional update on Philterd projects. Your address is never shared. Unsubscribe in one click.

Let’s discuss your project

Drop me a note. I usually respond within one business day.

My Experience and Philosophy

Everything below is the background behind the consulting work above — the experience, certifications, conference talks, and open source projects that support why you should hire me.

Deep Experience

My background isn’t just cloud — it’s software engineering and architecture. I write code daily and have worked in search, AI/ML, and NLP long before it was fashionable. 17 years on AWS, every current certification, AWS Community Builder, and I’ve even written questions for the AWS certification exams themselves.

Broad Exposure

I’ve delivered for clients across e-commerce, healthcare, natural resources, shipping logistics, data and AI, and government. That breadth means I quickly recognize the patterns in your stack, your team, and your constraints — and translate privacy requirements into engineering decisions that fit your reality.

Value on Community

I share my work in public. Whether contributing to Apache OpenNLP, maintaining OpenSearch components, releasing the Philterd open source toolkit, or speaking at international conferences, the principle is the same: a rising tide lifts all boats, and the work gets sharper when it’s exposed to peers.

Open Source First

Every engagement leans on open source — the Philterd toolkit (Philter, Phileas, Arbiter, Phield), Apache OpenNLP, and OpenSearch. Clients get auditable, vendor-neutral privacy platforms instead of proprietary black boxes, with full source access and no licensing lock-in to constrain future architectural choices.

Embedded with Your Team

I work as an embedded engineer, not a drive-by consultant. That means joining standups, pairing with your developers, contributing to your codebase, and shipping production-grade code alongside the team — so privacy architecture lands as something your engineers own, not as a document gathering dust.

Knowledge Transfer & Empowerment

My goal at the end of every engagement is making myself optional. I document decisions, write runbooks, train your team, and leave behind systems they can evolve confidently — because the best privacy platforms are the ones your engineers fully understand and can extend on their own.

Conference Presentations

A selection of talks at international conferences on NLP, search, and cloud.

Open Source Contributions

The Philterd open source toolkit is the foundation of every privacy engagement — alongside long-running contributions to Apache OpenNLP and OpenSearch.

Phileas

Open source library that provides the redaction, anonymization, and policy-control primitives used by Philter to precisely redact PII.

Arbiter

Part of the Philterd open source toolkit, extending the privacy stack with additional capabilities for PII-aware pipelines.

Phield

Part of the Philterd open source toolkit, providing additional building blocks for privacy-focused architectures.

Apache OpenNLP

User, developer, and PMC Chair for ~15 years. Apache OpenNLP is a solid NLP framework for Java. I’m an ASF Member.

Some Folks I’ve Worked With

A selection of organizations across healthcare, finance, government, e-commerce, logistics, natural resources, and data & AI where I’ve helped design, secure, and ship cloud and privacy infrastructure.