Thoughts

Semantic Drift: How to Build Institutional Memory into your Data Stack

Galen Marchetti

25 Sep 2025 • 5 min read

It always starts the same way.

A dashboard looks off. Maybe the churn rate spiked unexpectedly. Or the LTV curve flattened when it shouldn’t have. An analyst is tasked with figuring out what happened. They retrace the query. They check the join. They pull the raw tables. Nothing is obviously broken. And yet, the numbers don’t make sense.

After an hour (or several), they find the culprit: someone defined “active customer” differently for a campaign report six months ago, and that logic has quietly propagated into multiple dashboards and downstream metrics ever since.

No one was trying to mislead anyone. There was just no canonical definition to begin with, or if there was, it wasn’t discoverable.

This kind of silent logic drift happens everywhere:

Revenue numbers that include or exclude refunds, depending on the team
Subscription metrics that split differently between self-serve and enterprise
Retention curves skewed by inconsistent event filters

It’s not that people aren’t asking the right questions. It’s that they’re using subtly different definitions every time. AstroBee is built to stop that cycle.

What Is Semantic Drift?

Semantic drift is what happens when the reality of your business changes, but your data logic doesn’t.

Consider a subscription-based SaaS company. Initially, the company only had one core product and two straightforward subscription plans: monthly and annual. The analytics team built a metric called active_subscriptions, defined simply as:

SELECT COUNT(DISTINCT subscription_id)
FROM subscriptions
WHERE status = 'active'

This simple definition worked perfectly at launch and was embedded across dashboards, quarterly investor presentations, financial reports, and growth analyses. It quickly became a simple, single source of truth for how the team understood their subscriber base.

A year passes, and the company grows rapidly, introducing several new product tiers, pricing options, and free-trial plans. Product and growth teams add complexity: now, subscriptions can have statuses like active, trialing, paused, pending_cancel, and past_due. Some plans have grace periods or trial extensions, creating ambiguity around when exactly a subscription counts as truly “active.”

One analyst revises the core metric for a critical executive dashboard, updating it to:

SELECT COUNT(DISTINCT subscription_id)
FROM subscriptions
WHERE status IN ('active', 'trialing', 'past_due')
AND end_date > CURRENT_DATE

This new logic was subtly different, intended to reflect a more realistic measure of engagement. However, they didn’t document it thoroughly or propagate it clearly across other analytics assets.

Meanwhile, the finance team continued using the old, simpler definition in their reporting, which feeds into monthly revenue forecasts. Marketing built email campaigns and conversion funnels using yet another variation, excluding ‘trialing’ and ‘past_due’ entirely:

SELECT COUNT(DISTINCT subscription_id)
FROM subscriptions
WHERE status = 'active'
AND trial_end_date IS NULL

Months later, discrepancies begin to appear. Executives begin to notice that the active subscriber numbers from the product team’s dashboards consistently exceed those from finance’s reports. Marketing’s conversion rates look worse because their denominator (active subscriptions) is significantly smaller. Each team defends their definition as “correct” given their respective contexts, yet no definition matches another.

Eventually, after weeks of confusion and lost productivity, someone discovers the root cause: a seemingly minor difference in how each team defined a basic metric months earlier. This subtle drift of logic, this semantic drift, ****caused compounded losses in trust, decision-making quality, and countless hours of wasted labor.

Get AstroBee’s stories in your inbox

And the consequences are subtle but compounding: decisions based on outdated logic, conflicting reports in exec reviews, and wasted hours “just trying to get the numbers to match.”

Why does it happen?

Most teams treat logic as a side effect of the query, not a first-class asset. Current BI and analytics stacks like Looker, Tableau, Power BI, Mode, prioritize speed and ease of querying over consistency and governance. Analysts typically pull raw or lightly transformed data and build logic directly into dashboards or ad-hoc queries. Here’s what that looks like in practice:

Copy/paste culture: Someone grabs a snippet of SQL from an old dashboard, tweaks it, and ships it without realizing the original had context-specific assumptions.
No review layer: Metric logic is often written solo, without peer review or approval from the domain experts who actually own the concept.
No version control: Logic changes over time, but there’s no audit trail or changelog to show what changed, why it changed, or who changed it.
Org churn: The analyst who originally wrote the query leaves. The logic stays. But no one knows if it’s still right.
Business evolution: The business changes faster than the queries. New product lines, pricing models, or customer types make old logic incomplete or misleading.

Business Logic Should Be Responsive

Can we do better? Imagine if your most-used entities and metrics like CAC, LTV, active users, or MRR weren’t redefined every time they were queried. What if your semantic layer actively detected when business changes required updating logic and alerted domain experts instantly?

In our SaaS subscription scenario, AstroBee would have actively recognized the introduction of new subscription statuses (trialing, past_due) as soon as they appeared in your data warehouse. Instead of waiting months for teams to notice discrepancies, AstroBee immediately surfaces the change:

Detect: AstroBee notices new statuses like ‘trialing’ and ‘past_due’ appearing in the subscriptions table.
Propose: AstroBee sends a proactive notification to the subscription metric owner (finance or analytics teams) with a clear message: “Detected new statuses ‘trialing’ and ‘past_due’ in subscriptions data. Should these be included under active subscriptions?”
Validate & Govern: The responsible stakeholder confirms, ensuring the change aligns with relevant rules and business context
Deploy & Enforce: AstroBee updates the semantic layer immediately-automatically propagating the new definition across all dashboards, analyses, and queries that reference the active_subscriptions metric, and alerts stakeholders about the change.

With AstroBee, semantic drift is resolved proactively. Now, when your business evolves, your data logic evolves right alongside it, without waiting days or weeks for someone to notice a problem.

Each decision adds to institutional memory. What used to live in tribal knowledge or scattered files becomes searchable, governed, and centralized.

Analytics That Compounds Like Interest

The current state of analytics is wasteful:

Same joins rewritten 20 times
Same metrics revalidated every quarter
Same bugs rediscovered by new hires

Astrobee flips the script. It builds a shared memory layer for your business, one where logic gets better over time. Questions stop being one-offs and instead become contributions.

The result is not just faster analysis, but better strategy. Less time debugging, more time deciding. Astrobee doesn’t just help you ask questions. It helps your organization remember the answers.

Because if your data stack forgets what it’s already done, you’re always starting from zero.

If your team is interested in trying out true end-to-end self-serve analytics, we’d love to hear from you. Follow our progress on socials (LinkedIn and X) and reach out to hello@astrobee.ai to say hi.