Q12 · Design Google Street View Image Ingestion

1 Problem Restatement & Clarifying Questions #

Restatement. Design the backend that ingests 360° panoramic imagery from a fleet of camera-equipped taxis, processes it (blur PII, stitch, tile, build 3D), and serves it to (a) end-user panorama viewers, (b) downstream ML pipelines (imagery understanding, map generation), and (c) internal map/ops teams. Vehicles are mobile, cellular-connected, frequently in flaky-network conditions (tunnels, dead zones, dense urban canyons). Imagery is regulated data — it contains faces and license plates and therefore carries privacy + takedown obligations.

Clarifying questions I would ask (and my assumed answers for this doc).

#	Question	Assumed answer	Why it matters
Q1	Fleet size?	~10,000 taxis globally, skew toward dense metros (US/EU/JP/LatAm)	Drives ingest bandwidth + regional ingest endpoints
Q2	Camera rig?	6-camera ring, each 8MP, synced, plus GPS/IMU	Drives per-capture size + stitching work
Q3	Panorama resolution and format?	Equirect 8192×4096 stitched; raw per-camera JPEGs ~2MB each = ~12MB raw per capture; stitched JPEG/WebP ~8-15MB; tile pyramid ~20MB total	Storage + CDN math
Q4	Capture rate?	1 capture per ~10m driven (GPS-triggered), ~24h-of-storage local buffer on vehicle	Sets ingest velocity + buffering requirements
Q5	Active drive hours?	8 hrs/day avg per vehicle (fleet staggered)	BOE math
Q6	Privacy constraints?	Faces + plates blurred before any external serving; EU/DE harder constraints; right-to-be-forgotten requests	Hard invariants on serving path
Q7	Geographic coverage?	Planet-scale, but dense-metro-first	Drives regional storage tiers + S2 hot cells
Q8	Serving QPS?	1M panorama views/sec global peak (Street View is a high-traffic Maps feature); tile fanout ~10× (lookahead prefetch)	Drives CDN + tile-store design
Q9	Durability target?	11 9's (legal evidence in some jurisdictions, expensive to recapture)	Dictates erasure coding + multi-region replication
Q10	Latency budget — ingest vs serve?	Ingest: tolerant, minutes-to-hours staleness OK; serve: p99 < 300ms first tile	Lets us decouple pipelines
Q11	Are "fresh" captures prioritized (construction zones, new roads)?	Yes, priority lanes exist for specific geographies	Drives lane-aware scheduling in v3
Q12	Is the raw blob ever exposed externally?	NO — raw is internal-only; only blurred derivatives are servable	Core ACL invariant

If I only had time for three of the twelve: Q1 (fleet size), Q6 (privacy), Q9 (durability) — because these three redraw the architecture at the billion-dollar level.

2 Functional Requirements #

In scope

FR-1 Resumable chunked upload from vehicle over flaky cellular; client-side WAL; part-level retry; content-hash-based idempotency.
FR-2 Geo-tagged capture registration: each capture carries GPS (lat/lon/alt), heading (compass + IMU), timestamp (GPS-synced), camera intrinsics, vehicle ID, rig firmware version.
FR-3 Deduplication across taxis passing the same street within a short time window (pick best, demote rest to ML pool).
FR-4 Async processing pipeline: dedupe → stitch (6-camera → equirect) → PII blur (faces + plates) → SLAM/3D depth → tile pyramid (multi-zoom) → publish.
FR-5 Panorama serving to end users via CDN, level-of-detail tile pyramid, tile-server origin.
FR-6 Downstream ML export — raw (access-controlled internal consumers) and blurred (broader access) feeds into BigQuery / Dataflow / feature store.
FR-7 Takedown + right-to-be-forgotten — specific panos/bboxes can be invalidated; CDN purges propagate.
FR-8 Fleet control plane — vehicle auth, firmware attestation, upload quotas, back-pressure, SLI visibility per vehicle.
FR-9 Reprocessing on model upgrade — when blur model improves, re-blur cold imagery deterministically.

Out of scope

ML model training internals (face/plate detector, SLAM model weights) — those are trained offline on exported data. We treat them as versioned black-box services.
Map-matching / graph-building (road graph extraction from imagery) — consumes our feed, doesn't live here.
Consumer app (Maps Street View UI) — we provide tile URLs.
Billing for external API consumers — out of scope.
Vehicle routing/dispatch — separate fleet system.

3 NFRs + Capacity Estimate (full BOE math, reconciled) #

NFRs

Category	Target	Justification
Availability — ingest	99.9% (8.7 hr/yr downtime)	Vehicles can buffer 24h locally; ingest is async-tolerant
Availability — serve	99.99% (52 min/yr)	User-facing, Maps-critical
Durability — raw + processed	11 nines (10⁻¹¹ annual loss)	Imagery is legal evidence in some jurisdictions; recapture requires physically dispatching vehicle — $100s/revisit
Latency — upload	No hard SLA; p95 commit within 6h of capture under normal conditions	Vehicles with 24h local WAL tolerate several hours of backhaul
Latency — serve (first tile)	p99 < 300ms globally	User panorama pan must feel responsive
Latency — dedupe pipeline	P95 within 30 min of commit	Required before blur
Latency — blur	P95 within 2 h of commit; 99.99% of publicly-servable blobs are blur-complete	Hard invariant
Privacy — blur coverage	FN rate < 0.1% audited; auto-rollback on regression	Regulatory
Takedown SLA	24h CDN purge + origin invalidate	GDPR / equivalent
Tamper-evidence	Per-capture signed hash manifest, attested at vehicle boot	Anti-spoof

Capacity estimate — derived, not asserted

Ingest volume (per day).

10,000 taxis × 8 hrs/day × 3600 s/hr = 288M vehicle-seconds/day.
At 30 km/h urban avg = 8.33 m/s, 1 capture / 10 m = 1 capture / 1.2 s.
Expected captures/day = 288M / 1.2 ≈ 240M captures/day. Using the 300M from the prompt (higher assumed capture rate or more vehicles) for conservative sizing. I'll use 300M/day.
Raw bytes per capture: 6 cameras × 2MB JPEG = ~12MB. With IMU/metadata overhead + lossless archive margin = ~15MB raw wire bytes; prompt says 50MB which is a conservative upper bound (maybe raw RAW/DNG from cameras). I'll use two numbers: 15MB efficient (JPEG-in, JPEG-on-disk) and 50MB if raw/DNG preserved for future reprocess.

Scenario	Captures/day	Bytes/capture	Raw/day	Raw/yr
JPEG-only (efficient)	300M	15 MB	4.5 PB/day	1.6 EB/yr
Preserve raw DNG (prompt)	300M	50 MB	15 PB/day	5.5 EB/yr

I'll carry the 15 PB/day raw figure as baseline since the prompt set it.

Processed + tile pyramid bytes.

Stitched equirect 8192×4096 WebP ~8 MB.
Tile pyramid, 5 zoom levels, 512×512 tiles, overhead ~1.33×: ~11 MB processed/pano.
Processed = 300M × 11 MB = 3.3 PB/day incremental.

Ingest bandwidth — peak vs avg.

Avg = 15 PB / 86400 s = 174 GB/s sustained globally.
Peak surge (rush hour, all regions overlapping — won't happen globally, but within a region can): assume 4× avg in peak hour = 5 PB/hour ≈ 1.4 TB/s surge. The prompt's figure.
Distribute across ~10 regional ingest points (NA-East, NA-West, EU-Central, EU-West, JP, SG, AU, LatAm, IN, ME). Avg per region = 17 GB/s sustained, ~140 GB/s peak → one region needs ~1.5 Tbps ingress at peak. Solvable with cloud-provider regional PoPs; GCS regional buckets handle this natively.

Storage tiering.

Hot tier (near-region, SSD-backed, CDN-frontable): 30 days of processed = 30 × 3.3 PB = ~100 PB (plus ~30-day raw window for re-stitch ops = +450 PB = ~550 PB hot total).
Warm tier (HDD, regional, slower egress): 90 days post-hot = 90 × 18.3 PB/day (raw + processed) = 1.65 EB warm.
Cold tier (multi-region erasure-coded archive, Colossus / GCS Archive): 10-year retention at 18.3 PB/day × 3650 days = 66 EB cold after 10 yrs. Realistically with dedup/lifecycle at 2.5× compression factor for cold (JPEG doesn't compress much, but dedup + delta-encode similar tiles helps) → **25 EB cold** effective.

Sanity check: 25 EB is ~1% of the Colossus total footprint Google publicly discloses as its planet-scale storage baseline. Plausible.

Serving QPS + CDN bandwidth.

1M pano views/s peak global. Each view loads ~10 tiles initial + 20 on pan = ~30 tile GETs per session.
Tile QPS: 1M × 30 = 30M tile GET/s peak. Tiles are ~50KB each after CDN compression.
CDN egress peak: 30M × 50KB = 1.5 TB/s egress = 12 Tbps — within Google Cloud CDN / Google Global Cache scale. 95%+ hit rate assumed (tile data is hugely repetitive viewer-side).
Origin QPS: 5% × 30M = 1.5M QPS to tile origin. Single Bigtable cluster can do ~1M QPS; shard across 3–5 clusters per region.

Metadata row count.

300M captures/day × 365 × 10 yrs = 1.1 trillion captures steady-state. Spanner-scale (Spanner handles trillions of rows with proper sharding key).

Reconciliation check. Raw/day × 365 ÷ 1000 = 15 × 365 / 1000 = 5.5 EB/yr. At cold tier $0.004/GB/mo (GCS Archive) = 5.5 EB × 1e6 GB/EB × $0.004 × 12 mo = $264M/yr/year of archival — the realistic 10-yr TCO is O($1B-2B) in storage alone. This is the number that drives lifecycle aggressiveness.

4 High-Level API #

All APIs are gRPC internally, HTTPS/2 with streaming at the edge for upload. Authentication via vehicle device certificate (mTLS) issued by a fleet CA; attested at vehicle boot.

Upload (vehicle → regional ingest)

// Begin a resumable upload session. Idempotent by (vehicle_id, client_session_id).
rpc InitiateUpload(InitiateUploadRequest) returns (InitiateUploadResponse);

message InitiateUploadRequest {
  string vehicle_id = 1;           // fleet-unique, attested
  string client_session_id = 2;    // client-generated UUID; dedup key
  CaptureManifest manifest = 3;    // list of parts, sizes, sha256 per part
  bytes manifest_signature = 4;    // vehicle TPM-signed
  GeoHint geo_hint = 5;            // approximate GPS at session start, for region routing
}

message InitiateUploadResponse {
  string upload_id = 1;            // server-assigned, stable across retries
  repeated PartUrl part_urls = 2;  // signed PUT URLs, one per part, TTL=1h
  int64  part_size_bytes = 3;      // server-chosen optimal (default 12MB)
  string ingest_region = 4;        // which region to continue uploading to
  int64  backpressure_retry_after_ms = 5; // 0 = OK, >0 = client should wait
}

// Upload one part. Client retries safe (part_hash match → 200 OK no-op).
rpc PutPart(PutPartRequest) returns (PutPartResponse);
message PutPartRequest {
  string upload_id = 1;
  int32  part_number = 2;
  bytes  part_bytes = 3;    // streamed
  string part_sha256 = 4;   // client-computed
}
message PutPartResponse {
  enum Status { OK = 0; HASH_MISMATCH = 1; BACKPRESSURE = 2; REJECTED = 3; }
  Status status = 1;
  int64  retry_after_ms = 2;
}

// Commit: atomically transitions the upload from staging to captured state.
// Content-hash-idempotent: a second Commit with same hash returns same capture_id.
rpc CommitUpload(CommitUploadRequest) returns (CommitUploadResponse);
message CommitUploadRequest {
  string upload_id = 1;
  string aggregate_sha256 = 2;     // covers all parts, vehicle-signed
}
message CommitUploadResponse {
  string capture_id = 1;           // globally unique, deterministic from hash
  string s2_cell_id_l14 = 2;       // server-derived, for debug
}

// Metadata registration (can be batched; usually called by pipeline after dedup).
rpc RegisterCapture(RegisterCaptureRequest) returns (RegisterCaptureResponse);
message RegisterCaptureRequest {
  string capture_id = 1;
  GpsFix gps = 2;                  // lat, lon, alt, hdop, timestamp
  Heading heading = 3;             // IMU+compass
  int64  capture_ts_ns = 4;
  CameraParams params = 5;         // intrinsics, firmware version
  string vehicle_id = 6;
  string raw_blob_ref = 7;         // gs:// path
  bytes  attestation = 8;
}

Pipeline events (Pub/Sub)

Topic: captures.ingested        // published on CommitUpload
  {capture_id, vehicle_id, s2_cell_l14, ts, raw_blob_ref, size, hashes...}
Topic: captures.deduped
Topic: captures.stitched
Topic: captures.blurred          // triggers publish
Topic: tiles.generated
Topic: captures.takedown         // PII takedown request

All topics retained 7 days for replay; ordering key = s2_cell_l10 so replays per-cell are serial.

Serving (user → CDN → tile origin)

// Public: fronted by CDN, signed URLs for authenticated session if needed.
GET /v1/pano/{s2_cell_l14}/{zoom}/{tile_x}/{tile_y}.webp
  -> tile bytes (50KB typical, cached at CDN)

GET /v1/pano/lookup?lat={lat}&lng={lng}&radius_m={r}
  -> list of nearby pano_ids with capture_ts + coverage_score
  (geo-query served from S2-indexed metadata)

GET /v1/pano/{pano_id}/manifest
  -> {tile_urls, depth_map_url, neighbor_pano_ids, capture_ts, coverage_quality}

Internal (downstream ML, ops)

rpc ExportCaptures(ExportRequest) returns (stream ExportBatch);  // BigQuery extract
rpc InvalidatePano(InvalidateRequest) returns (InvalidateResponse); // takedown
rpc ReprocessCaptures(ReprocessRequest) returns (ReprocessResponse); // model-upgrade triggered re-blur

Idempotency invariants.

InitiateUpload idempotent on (vehicle_id, client_session_id).
PutPart idempotent on (upload_id, part_number, part_sha256).
CommitUpload → capture_id = hash(aggregate_sha256) → committing same content twice returns same capture_id, no duplicate row.
Pipeline events carry capture_id; each stage is keyed by it → exactly-once per stage via idempotent writes.

5 Data Schema #

Engine choice matrix

Data	Engine	Why chosen	Rejected
Capture metadata (GPS, heading, refs)	Spanner (globally-replicated)	Strong consistency for (capture_id → blob_ref) lookup; SQL for ops; geo-distributed for low-latency serve-path reads; handles trillion rows	Bigtable rejected: eventual-consistent secondary indexes; Cassandra: operational burden and cross-region consistency; MySQL/Postgres: doesn't shard to planet scale without third-party (Vitess/Citus) ops pain
Blob data (raw, stitched, tiles)	Colossus / GCS (Reed-Solomon erasure coded)	Durability 11-9s with (10,4) or (9,4) EC; cost; native lifecycle tiering hot→warm→cold→archive	Block storage / PD: 10–100× cost; self-managed HDFS: ops + lower durability
Tile-server lookup cache (hot)	Bigtable (row key: `s2_cell_l14#zoom#tile_xy#version`)	Range scans along S2 Hilbert curve → local reads; 1M QPS/cluster; low-latency point reads	Spanner: overkill + costlier for read-only tile lookups; Redis: not durable enough, memory-bound
Upload staging state	Bigtable (row key: `upload_id`) with TTL 7d	Write-heavy, short-lived, high throughput; simple row-level ops; TTL cleans old sessions	Spanner: write amplification from global replication for short-lived data is wasteful
S2 geo-index	Bigtable secondary; row key: `s2_cell_l10#capture_ts#capture_id`	Range scan "all captures in this cell for this date range" is a single locality read; co-locates cell neighbors	Elasticsearch geo_point: ops burden + weak consistency; R-tree in Postgres: doesn't shard
Pipeline state machine	Spanner (row key: `capture_id`, columns per stage)	Need atomic stage transitions + audit; strong consistency for "has this passed blur?" invariant	Bigtable: stage transitions are read-modify-write → need transactions → Spanner is simpler
Analytics / ML export	BigQuery (scheduled export from Spanner CDC + blob manifests)	SQL at PB scale, columnar, federated to blob refs	Dataflow into Parquet on GCS: fine, but analysts want SQL; BQ wins
Dedup LSH index	Bigtable (row key: `s2_cell_l14#time_bucket#phash_prefix`)	LSH buckets map cleanly to row-key prefixes; dedup is a cell-local operation	HNSW in memory: doesn't scale to 300M/day; cross-region replication is painful

Schema — key tables

// Spanner: captures (primary metadata)
CREATE TABLE captures (
  capture_id          STRING(64) NOT NULL,          // derive from content hash
  vehicle_id          STRING(32) NOT NULL,
  session_id          STRING(64),
  capture_ts          TIMESTAMP NOT NULL,
  gps_lat             FLOAT64 NOT NULL,
  gps_lng             FLOAT64 NOT NULL,
  gps_alt             FLOAT64,
  gps_hdop            FLOAT64,
  heading_deg         FLOAT64,
  s2_cell_l10         INT64 NOT NULL,               // ~150 km² cell, shard key
  s2_cell_l14         INT64 NOT NULL,               // ~150 m cell, serve key
  raw_blob_ref        STRING(256) NOT NULL,         // gs://raw-bucket/{year}/{mo}/{s2_l10}/{capture_id}
  stitched_blob_ref   STRING(256),
  tile_manifest_ref   STRING(256),
  blur_status         STRING(16) NOT NULL,          // pending | running | passed | failed
  blur_model_ver      STRING(16),
  dedup_group_id      STRING(64),                   // LSH cluster id
  quality_score       FLOAT32,                      // for dedup "best pick"
  takedown_status     STRING(16) NOT NULL DEFAULT 'active',
  firmware_ver        STRING(32),
  camera_intrinsics   BYTES(MAX),                   // protobuf
  attestation         BYTES(256),                   // vehicle TPM signature
  created_ts          TIMESTAMP NOT NULL OPTIONS(allow_commit_timestamp=true),
) PRIMARY KEY (s2_cell_l10, capture_ts DESC, capture_id);
// Interleaved shard layout: reads/writes for a cell locate to same split;
// cell is the natural locality unit.

CREATE INDEX idx_captures_vehicle ON captures (vehicle_id, capture_ts DESC);
CREATE INDEX idx_captures_dedup_group ON captures (dedup_group_id);
CREATE INDEX idx_captures_takedown_pending ON captures (takedown_status)
  WHERE takedown_status = 'pending';

// Bigtable: upload_staging (row-key: upload_id)
upload_id | column family "meta" -> vehicle_id, session_id, manifest_proto, created_ts, ttl_ts
          | column family "parts" -> part_1_status, part_1_sha256, part_1_size, ..., part_N_...
          | column family "commit" -> committed_ts, aggregate_sha256, capture_id
// TTL = 7 days; garbage-collected sessions.

// Bigtable: tiles (row key: {s2_cell_l14}#{zoom}#{tile_xy_interleaved}#{version})
// Locality group "blob" -> blob_ref (gs:// or inline for tiny)
// Locality group "meta" -> content_hash, generated_ts, pipeline_version
// Tombstone row on takedown; CDN purge keyed on row key.

// Bigtable: geo_index_l10 (row key: {s2_cell_l10}#{capture_ts_reversed}#{capture_id})
// Answers: "captures in cell X between T1 and T2" via single range scan.
// Reversed timestamp → newest first without DESC sort.

// Bigtable: dedup_lsh (row key: {s2_cell_l14}#{time_bucket_10min}#{phash_simhash})
// Value: list of capture_ids with same LSH bucket → dedup candidates.

// Spanner: pipeline_state (idempotent stage gating)
CREATE TABLE pipeline_state (
  capture_id STRING(64) NOT NULL,
  stage      STRING(16) NOT NULL,   // ingest, dedup, stitch, blur, slam, tile
  status     STRING(16) NOT NULL,   // pending, running, success, failed, skipped
  attempt    INT64 NOT NULL,
  worker_id  STRING(64),
  started_ts TIMESTAMP,
  finished_ts TIMESTAMP,
  output_ref STRING(256),
  error_msg  STRING(1024),
) PRIMARY KEY (capture_id, stage);
// Strict FSM: blur.success is a gate for tile generation; enforced by pipeline.

Why Spanner for metadata (vs Bigtable everywhere): the blur_status = passed invariant is a strong-consistency read — when the serving tier answers "can I serve this tile?" it must NEVER see a stale blur_status = pending that was flipped 5s ago. Bigtable's eventual consistency between regions would allow a brief window where a European reader sees a US-region-committed passed but hasn't replicated yet — that's a compliance-ending race. Spanner's external consistency eliminates it. We pay the write latency cost (5-10ms cross-region commit vs Bigtable's 1ms local write) and it's worth it.

6 System Diagram (ASCII) — centerpiece #

6.1 End-to-end

                                                            ┌─────────── CONTROL PLANE ───────────┐
                                                            │ Fleet CA (mTLS cert issuance)        │
                                                            │ Attestation service (TPM quote)      │
                                                            │ Quota + backpressure controller      │
                                                            │ Config service (firmware, model ver) │
                                                            │ Observability (SLI/SLO, alerts)      │
                                                            └──────────────────────────────────────┘
                                                                          ▲
                                                                          │ policy + attestation
  ═══════════════ VEHICLE (edge) ═══════════════════    ┌─────────────────┴────────────────┐
  ┌────────────────────────────────────────────────┐    │     REGIONAL INGEST (N=~10)      │
  │ 6-cam rig (2MP × 6) + GPS/IMU                  │    │ ┌────────────────────────────┐   │
  │ Onboard stitch preview + JPEG encode           │    │ │ L7 LB (Envoy) — mTLS       │   │
  │ TPM signs manifest hash                        │    │ │ auth vehicle_id + cert     │   │
  │ Local WAL on NVMe (24 h, ~24 GB/taxi)          │───▶│ │ rate-limit + backpressure  │   │
  │ Upload agent: part-level retry, WAL drain,     │    │ └──────────────┬─────────────┘   │
  │ backpressure-aware scheduler                   │    │                ▼                 │
  │ Part size 12 MB; HTTPS/2 + QUIC fallback       │    │ ┌────────────────────────────┐   │
  └────────────────┬───────────────────────────────┘    │ │ Upload Service (stateful)  │   │
                   │                                    │ │ - Spanner: staging state   │   │
                   │ mTLS, PUT /part (12 MB),           │ │ - Signed URL issuer        │   │
                   │ avg 30 Mbps cellular,              │ │ - Part-hash validator      │   │
                   │ retries with Retry-After headers   │ │ - Commit → deterministic   │   │
                   ▼                                    │ │   capture_id = h(content)  │   │
                                                        │ └──────────────┬─────────────┘   │
                                                        │                │ gs:// PUT        │
                                                        │                ▼                 │
                                                        │ ┌────────────────────────────┐   │
                                                        │ │ RAW BLOB STORE (regional)  │   │
                                                        │ │ GCS "raw-ingest-{region}"  │   │
                                                        │ │ ACL: write=taxi-SA,        │   │
                                                        │ │      read=pipeline-SA only │   │
                                                        │ │ (NO public read EVER)      │   │
                                                        │ │ Reed-Solomon (9,4) EC      │   │
                                                        │ │ 30-day hot, then tier →    │   │
                                                        │ └──────────────┬─────────────┘   │
                                                        │                │                 │
                                                        │                ▼ commit event    │
                                                        │ ┌────────────────────────────┐   │
                                                        │ │ Pub/Sub: captures.ingested │   │
                                                        │ │ ordered by s2_cell_l10     │◀──┐
                                                        │ └──────────────┬─────────────┘   │
                                                        └────────────────┼─────────────────┘
                                                                         │
                              ═══════════ PROCESSING DAG (Dataflow/Beam) ═══════════
                                                                         │
                                                                         ▼
                                                ┌──────────────────────────────────────────────┐
                                                │ Stage 1: DEDUPE                              │
                                                │  - Compute pHash+simhash from stitched prev  │
                                                │  - Lookup dedup_lsh by s2_cell_l14#10min     │
                                                │  - If LSH match: pick best (quality score),  │
                                                │    demote rest to ML-only pool               │
                                                │  - Emit to captures.deduped                  │
                                                └──────────────────────┬───────────────────────┘
                                                                       ▼
                                                ┌──────────────────────────────────────────────┐
                                                │ Stage 2: STITCH                              │
                                                │  - 6-cam → equirect 8192×4096                │
                                                │  - Output: stitched_blob_ref                 │
                                                │  - ACL: still internal-only                  │
                                                └──────────────────────┬───────────────────────┘
                                                                       ▼
                                                ┌──────────────────────────────────────────────┐
                                                │ Stage 3: PII BLUR (HARD GATE)                │
                                                │  - GPU fleet: T4 / L4, ~5 pano/s each        │
                                                │  - Face detector + plate detector (2 models) │
                                                │  - Deterministic blur (seeded gaussian)      │
                                                │  - blur_model_ver pinned per capture         │
                                                │  - Auto-rollback if FN-rate regresses        │
                                                │  - Emit captures.blurred with signed proof   │
                                                │  - Store blurred_blob_ref (NEW ACL: public)  │
                                                └──────────────────────┬───────────────────────┘
                                                                       ▼
                                         ┌─────────────────────────────┴─────────────────────────┐
                                         ▼                                                       ▼
                          ┌──────────────────────────────┐               ┌─────────────────────────────────┐
                          │ Stage 4a: SLAM / 3D          │               │ Stage 4b: TILE PYRAMID          │
                          │  - Depth estimation          │               │  - Zoom 0..5 (512×512 WebP)     │
                          │  - Neighbor stitching        │               │  - Version = blur_model_ver     │
                          │  - Pose refinement           │               │  - Write to tiles table (BT)    │
                          │  - depth_map blob            │               │  - Atomic activate → serving    │
                          └──────────────────────────────┘               └──────────────────┬──────────────┘
                                                                                            │
                                                                                            ▼
                                                                         ┌───────────────────────────────┐
                                                                         │ PROCESSED BLOB STORE          │
                                                                         │ GCS "pano-tiles-{region}"     │
                                                                         │ multi-regional bucket         │
                                                                         │ ACL: read=CDN + serve-SA      │
                                                                         │ Lifecycle: 30d hot → warm →   │
                                                                         │   cold (90d) → archive (10y)  │
                                                                         └──────────────┬────────────────┘
                                                                                        │
                              ═══════════════════ SERVING PATH ═══════════════════      │
                                                                                        │
  ┌─────────────────┐     ┌─────────────────┐    ┌──────────────────┐    ┌──────────────┴──────────────┐
  │ End user        │◀───▶│ CDN (Google     │◀──▶│ Tile Origin      │◀──▶│ tiles (Bigtable) + blob ref │
  │ Maps / SDK      │     │  Global Cache)  │    │ gRPC, p50 5ms    │    │ + processed blob store      │
  │ 1 M sess/s peak │     │ 95% hit rate    │    │ 1.5 M QPS peak   │    └─────────────────────────────┘
  └─────────────────┘     └─────────────────┘    └──────────────────┘
           │                                             │
           ▼                                             ▼
  ┌───────────────────┐                    ┌──────────────────────────────┐
  │ Metadata lookup   │                    │ Metadata serve (Spanner       │
  │ /v1/pano/lookup   │───────────────────▶│ read replica in region, point │
  │ (nearby panos)    │                    │ reads + s2 range scans)       │
  └───────────────────┘                    └──────────────────────────────┘

  ═══════════════ DOWNSTREAM CONSUMERS ═══════════════
  • ML training: BigQuery export of (capture_id, blob_refs, metadata) — internal-only feed
  • Map generation: Dataflow job consuming captures.blurred, producing road-graph deltas
  • Internal tools: takedown workflow, reprocess-on-model-upgrade controller

6.2 Upload path sub-diagram (L7 detail)

VEHICLE                           EDGE LB              UPLOAD SVC            RAW BUCKET
  │                                 │                      │                    │
  │ InitiateUpload(manifest, sig)   │                      │                    │
  ├───(mTLS, 2 KB)─────────────────▶│                      │                    │
  │                                 │──(region-affinity)──▶│                    │
  │                                 │                      │ validate sig       │
  │                                 │                      │ reserve upload_id  │
  │                                 │                      │ gen N signed URLs  │
  │                                 │                      │ stash state (BT)   │
  │◀────────(upload_id, part_urls, part_size=12MB)────────┤                    │
  │                                 │                      │                    │
  │ PutPart(1, 12MB, sha256)        │                      │                    │
  ├───(HTTPS/2, 12MB, ~3s @30Mbps)─▶│──────────────────────┼───────────────────▶│
  │                                 │                      │                    │ write chunk
  │                                 │                      │                    │ 
  │ [if response.retry_after_ms > 0]│                      │                    │
  │ sleep(jitter), retry            │                      │                    │
  │                                 │                      │                    │
  │ [parallel PutPart(2..N)]        │                      │                    │
  │ ... 4-way parallel typical      │                      │                    │
  │                                 │                      │                    │
  │ [network drops mid-part]        │                      │                    │
  │ WAL preserves part; on reconnect│                      │                    │
  │ re-PUT same part_number,        │                      │                    │
  │ server: hash match → 200 no-op  │                      │                    │
  │                                 │                      │                    │
  │ CommitUpload(agg_sha256)        │                      │                    │
  ├────────────────────────────────▶│─────────────────────▶│ verify all parts   │
  │                                 │                      │ finalize multipart │────(compose)──▶│
  │                                 │                      │ derive capture_id  │                │
  │                                 │                      │ = hash(content)    │                │
  │                                 │                      │ publish Pub/Sub    │                │
  │◀──────(capture_id, s2_cell)─────┤                      │                    │                │

6.3 Serving path sub-diagram

USER                   CDN              TILE ORIGIN              BIGTABLE (tiles)         GCS (processed)
  │                     │                    │                       │                       │
  │ GET /v1/pano/{s2}/{z}/{x}/{y}.webp       │                       │                       │
  ├────────────────────▶│                    │                       │                       │
  │                     │ [cache hit 95%]    │                       │                       │
  │◀─── 50KB tile ──────┤                    │                       │                       │
  │                     │                    │                       │                       │
  │                     │ [cache miss 5%]    │                       │                       │
  │                     ├───────────────────▶│                       │                       │
  │                     │                    │ lookup by row key     │                       │
  │                     │                    ├──────────────────────▶│                       │
  │                     │                    │◀────blob_ref──────────┤                       │
  │                     │                    │ fetch blob            │                       │
  │                     │                    ├───────────────────────┼──────────────────────▶│
  │                     │                    │◀──────50KB tile ──────┼───────────────────────┤
  │                     │                    │                       │                       │
  │                     │◀── tile + cache────┤                       │                       │
  │◀─── 50KB tile ──────┤                    │                       │                       │
  │                     │                    │                       │                       │
  │                     │ [TTL 7d, purge on takedown event]          │                       │

Arrow annotations.

Vehicle → Edge LB: mTLS HTTPS/2, ~30 Mbps avg (cellular), 12 MB/part, 4-way parallel parts → ~120 Mbps burst per taxi → fleet peak ingress dominated by regional aggregation.
Pub/Sub → Dataflow: ordered by s2_cell_l10, ~3500 msg/s per region avg, 14K peak.
Dataflow stages: Beam PTransforms, checkpointed; stage-local parallelism scaled by GPU pool size for blur.
CDN → Origin: 5% miss rate at 30M tile QPS → 1.5M QPS origin; tiles table is keyed for Hilbert locality, scans co-locate.

7 Deep-Dives (3 critical topics at L7 depth) #

7.1 Resumable chunked upload over flaky vehicular cellular — the earned-secret depth

Why critical. At 10K taxis × 24h WAL, a regional ingest outage could push ~240 TB of backlog on recovery. If the reconnect storm isn't shaped, the ingest tier melts and you lose days of imagery. The upload protocol IS the reliability story.

Alternatives considered.

Option	Throughput	Retry cost	Complexity	Why rejected/chosen
Single POST per capture	Good on wired, terrible on cellular	Full re-upload on drop: 12 MB × retry count	Trivial	Rejected: 10-20% of captures on cellular have a network blip during a 3s transfer; 10% retry = 10% wasted bandwidth
Multipart resumable, 12 MB parts ← chosen	Excellent; parallel parts fill BDP	Only failed parts retry: ~2 MB expected loss per capture	Medium	Chosen: sweet spot. Details below.
Byte-range resumable (tus.io style)	Similar to multipart	Good, but no parallelism	Medium	Rejected: can't fill cellular BDP in parallel; single-stream TCP reorder after handoff between cell towers causes throughput collapse
gRPC bidi streaming	Excellent on stable conn	Server-side buffering complex on drop	High	Rejected: server state complexity + harder retry semantics across load balancers

Why 12 MB parts — the L7 insight.

Too small (<5 MB): HTTP/2 stream setup + per-part metadata write in staging table + S3/GCS per-part billing overhead dominates. At 2 MB parts you're making 6 round-trips per capture instead of 1. S3's 5 MB multipart minimum is not technical — S3 can handle 1 KB parts — it's billing-policy: AWS wants to avoid micro-part storms that create metadata cost asymmetry.
Too large (>64 MB): on cellular with 5-10% drop rate mid-part, you lose 64 MB per drop and have to re-upload it. Average wasted bandwidth scales linearly with part size × drop probability.
12 MB specifically because:
- Matches GCS default object chunk size (8 MB) rounded up for alignment with 4 MB TCP send buffer batching — fewer syscalls in kernel.
- On LTE with ~30 Mbps sustained and 100-300ms RTT, TCP BDP = 30 Mbps × 0.2s = 750 KB; parallel 4 streams × 12 MB = 48 MB inflight → saturates cellular with headroom.
- Per-part SHA256 costs ~40ms on vehicle ARM SoC — acceptable latency; aligns with keeping commits <2h P95.

Client-side WAL (the production-earned detail).

Vehicle NVMe layout:
  /wal/pending/{capture_id}/part_{N}.bin    (raw part bytes, fsync'd on write)
  /wal/pending/{capture_id}/manifest.pb     (TPM-signed manifest)
  /wal/uploaded/{capture_id}/receipts/{N}.sig  (server signed receipt per part)
  /wal/committed/{capture_id}/                 (empty marker: capture safely landed)

Storage budget: 24 GB, enough for 24h of uploads at 12 MB/pano × 2000 pano/vehicle.
GC policy: move to /committed after server Commit ACK; delete /committed after 7 days
(retention for audit + reprocess requests).

The WAL must be on NVMe, not eMMC — we learned the hard way that sustained 150 GB/day writes to eMMC burns it out in 18 months. NVMe with DWPD ≥ 1 is non-negotiable for fleet hardware.

Back-pressure — control loop.

Server returns retry_after_ms in three places:
- LB level (Envoy): based on global CPU + connection count in region.
- Upload service level: based on Spanner commit QPS vs SLO.
- Pub/Sub level: if ingestion topic backlog exceeds N minutes, signal "don't commit yet."
Client honors the MAX of these and adds jitter: sleep(retry_after_ms + uniform(0, retry_after_ms)).
The jitter is critical: without it, a coordinated surge (fleet-wide firmware update, tunnel exit of a convoy) causes all vehicles to retry at exactly retry_after_ms → second-order storm.

Failure modes.

Half-uploaded part, client crash: on reboot, scan WAL, resume with same upload_id; server treats identical part-hash PUT as idempotent no-op.
Corrupt bit flip during upload: per-part SHA256 catches it; server returns HASH_MISMATCH; client re-reads from WAL and retries. If WAL also has the corrupt byte (NVMe error), client escalates to re-capture (if still in range) or logs uncorrectable.
Clock skew: GPS-synced; if GPS lost, vehicle switches to monotonic + last-known-good; attestation service detects skew > 60s and flags capture for triage.
Upload service rolls mid-commit: staging state is in Bigtable, not RAM; new instance resumes from Spanner's upload FSM row. Commit is idempotent on aggregate_sha256.

Real systems named. tus.io (byte-range resumable spec), AWS S3 multipart, GCS resumable upload protocol, YouTube's Resumable Upload for mobile creators, Mapillary's upload API (similar problem, similar shape — they use 10 MB chunks with content hash dedup).

7.2 Geo-indexing at planet scale — S2 vs geohash vs R-tree, quantified

Why critical. Two workloads with conflicting requirements:

Ingest-side sharding: 300M writes/day need locality so that "captures in lower Manhattan" land on a small number of Spanner splits.
Serve-side range: "panos within 50m of user's tap" needs a <10ms range scan to answer /v1/pano/lookup.

A bad geo-key choice ruins both. Hot cells (Times Square, Shibuya, Piccadilly) will melt a naive scheme.

Alternatives quantified.

Scheme	Locality	Neighbor queries	Hot-spot mitigation	Write scalability	Index size	Chosen?
S2 cells (Hilbert curve)	Excellent (Hilbert preserves 2D locality; neighbors share key prefix in most cases)	8-neighbors via `CellId.neighbors()`; range `[cell, cell.next_level()]`	Sub-cell hashing at L14 for known hot cells	Excellent (sharded on cell_l10)	64-bit uint	YES
Geohash (base32 z-order)	Good (z-order) but has seam discontinuities at poles + prime meridian; neighbor queries cross seams	`geohash_neighbors` heuristic, but returns 0 at equator crossing	Same sub-hash trick, but seams still hurt	Good	~12 char string	Rejected (see below)
R-tree (Postgres PostGIS)	Arbitrary rect query is O(log N); great for polygons	Native GiST index	Rebalance on insert → hot-node write amplification	Poor at scale (single node or manual shard)	Tree depth ~7 at 1T rows	Rejected for write path
Z-order / Morton curves	Similar to geohash	Same seam issues	Same	Similar	64-bit	Rejected: strict subset of S2's benefits
Uber H3 (hex grid)	Great (hex neighbors are uniform)	6 neighbors, not 8; uniform	Sub-hex hashing possible	Great	64-bit	Close second — we'd pick H3 if we needed uniform area per cell (e.g., ride dispatch); we don't

Why S2 wins for us, concretely.

Hilbert locality in Bigtable row keys. Bigtable is lex-ordered on row key. S2 cell IDs encoded in Hilbert order mean two geographically-close cells usually have keys close in byte space. A scan for "all captures in a 2km box" hits 1-3 Bigtable splits. Geohash has seam cases where adjacent cells differ in top byte → 2× more splits hit in worst case.
Variable resolution. One cell ID encodes both "cell" and "resolution" in same 64-bit int. I can store s2_cell_l10 as shard key (coarser ~~150 km² buckets for sharding → balanced) and s2_cell_l14 as serving key (~~150m buckets for "nearby"). Geohash requires two different strings.
Area uniformity is acceptable, not perfect. S2 cells vary ~2× in area across the globe (cube projection distortion). For our serving use case (nearby-pano lookup) 2× is fine — we just request a slightly larger radius. H3 would be uniform hex but lacks the 20-year battle-tested C++/Java/Go library ecosystem S2 has (Google's s2geometry), and at Google — using Google's library is table stakes.
Hot-cell mitigation. For known hot cells (Times Square L14 cell at ~150m resolution may carry 100K captures/month), we salt the row key:
```
// Normal:     row_key = s2_cell_l14 | time_reverse | capture_id
// Hot-cell:   row_key = s2_cell_l14 | hash(capture_id) % 16 | time_reverse | capture_id
```
The hash % 16 splits the cell across 16 sub-shards → scans for that cell fan out, but total volume is manageable. Hot-cell list maintained by a daily job that analyzes per-cell QPS; promoted/demoted automatically.
Neighbor queries. S2CellId.GetAllNeighbors(level) gives you the 8-connected neighbors at same level in 80 lines of C. Use this for "when stitching, give me captures in adjacent cells at level 14 within ±30s" — a single range-scan per neighbor cell.

Operational tuning learned.

Level choice for sharding: L10 has ~2M cells worldwide → one shard per 5M cells if you had only one Spanner split per L10 cell that's way too coarse; we use L10 as the bucket but let Spanner auto-split sub-ranges. Writes per L10 per day typical: 300M / (2M × population_fraction) ≈ 1000 writes/sec in a hot L10 → ok for a single Spanner split.
Level choice for serving: L14 (~150m) is the natural pano "snap radius" — two panos at the same L14 cell are effectively on the same street corner.
Don't index on raw lat/lng. We learned this elsewhere — a B-tree on (lat, lng) has terrible range-scan semantics for 2D regions because B-tree linearizes on lat first, so "within X km" degenerates to a full lng scan per matching lat slice.

Real systems named. Google S2 (used internally in Maps, YouTube, Adwords geo-targeting), Uber H3 (ride dispatch), OpenStreetMap Nominatim (geohash), PostGIS (R-tree GiST), Mapbox vector tiles (Mercator quad-key — basically geohash).

7.3 PII blurring pipeline — the hard privacy invariant at fleet scale

Why critical. One unblurred face served in the EU → class-action, fine up to 4% revenue, press catastrophe. Germany's 2010 Street View opt-out was a direct cost — parts of Germany still have no imagery because takedown-at-collection was too expensive to retrofit. The blur pipeline is the risk-carrying component. Everything else can be rebuilt from logs; an unblurred pano served is not recoverable.

The hard invariants.

No blob in the pano-tiles-* bucket is public-readable until blur_status = passed in Spanner. Enforced by ACL at bucket level + signed-URL flow at serve tier.
Blur is deterministic. Same input + same model_version → same output bytes. Lets us (a) audit (rehash + compare), (b) re-blur cold data if model improves, (c) reconcile across regions.
Detection false-negative rate (FN — missed face/plate) < 0.1% audited. A regression that pushes FN to 1% must trigger auto-rollback within 30 min.

Alternatives considered.

Approach	Throughput	FN rate	Cost/day (300M panos)	Audit	Chosen?
Two-stage: face detector (YOLO-variant) + plate detector (custom CNN) + Gaussian blur	5 pano/s per T4 GPU	~0.05% FN when tuned	~$12K/day (T4 fleet)	Deterministic, replayable	YES
Segmentation model (panoptic) covering faces+plates as classes	2 pano/s per T4	~0.08% FN	~$30K/day	Harder to retrain per-class	Rejected: throughput/cost worse, joint retraining is slower
Human-in-loop review	~1 pano/s per reviewer	~0% FN	$0.10/pano × 300M = $30M/day (!!)	Perfect audit but cost	Rejected for primary path; retain for appeals + audit sampling
On-device blur at capture	Offloads cost; bandwidth savings	Model too large for vehicle SoC; FN 1% tolerable	$0 compute; $500/vehicle hardware	Hard to update model	Rejected as primary — no model-version upgrade without fleet flash; kept as Phase-2 feature-flagged for low-priority regions

Throughput math (GPU fleet sizing).

300M pano/day ÷ 86400s = 3472 pano/sec steady-state.
1 T4 GPU @ 5 pano/s → need 695 GPUs steady.
Peak 4× → need ~2800 GPUs burst. We run 2× steady (1400) reserved + autoscale burst from preemptible pool.
Cost: 1400 × $0.35/hr × 24h = $11,760/day on-demand; with committed-use discount ~$7K/day; with preemptible for burst, ~$9K/day blended.
Single pano latency: ~180ms (6 sub-images × 30ms); acceptable because pipeline stage is throughput-bound not latency-bound.

The hard-gate invariant — how it's enforced.

1. Pipeline writes stitched_blob_ref to internal-only bucket "pano-internal-{region}".
   ACL: read = pipeline-SA, write = pipeline-SA. No IAM binding to CDN, no public read.

2. Blur stage:
   - Reads stitched_blob from internal bucket.
   - Runs detectors → deterministic_blur → output bytes B.
   - Writes B to "pano-tiles-{region}" with generation metadata
     {model_ver, detector_ver, blur_proof_hash=h(B)}.
   - Updates Spanner: blur_status = passed, blur_model_ver = X, blur_proof_hash = h(B).

3. Tile serving:
   - Tile origin reads blob from "pano-tiles-*" and ALSO reads Spanner row.
   - REQUIRES blur_status = passed AND blur_proof_hash matches h(actual blob bytes)
     before returning to CDN. (Belt + suspenders: ACL should make this impossible
     already, but the Spanner check catches any bucket-ACL mistake.)

4. Takedown:
   - Sets blur_status = retracted + takedown_status = active.
   - CDN purge by row-key prefix.
   - Origin denies on retracted even if blob still there.
   - Async job overwrites blob with takedown placeholder.

Deterministic blur — why it matters at L7.

We chose seeded Gaussian over random occlusion. Seed = h(capture_id || model_ver). Properties:

Re-blur produces identical bytes → hash check = integrity check.
Across regions: US blur and EU blur of same capture produce same bytes → no cross-region drift.
Audit: sample 0.1% of panos; re-run blur offline; compare hashes. A mismatch = pipeline bug or bit-rot → page.
The L7 insight: non-deterministic blur (what naive teams ship first) means you CAN'T re-blur on model upgrade without invalidating downstream derivatives. Deterministic blur means the re-blur of cold imagery on model v2 writes identical bytes for unchanged regions (where old model was already correct) — so tile cache invalidation is surgical, not whole-pano.

Re-blur on model upgrade — the workflow.

Every 3-6 months a new blur model ships with 20-30% lower FN rate. Requirement: re-process historic imagery so serve-time FN is always below threshold.

Scheduler enumerates captures where blur_model_ver < current in priority order (recent + high-traffic cells first).
Throttle to <10% of steady GPU capacity to avoid contending with live ingest.
Cost math: 10-year archive = 1.1T captures. Re-blur everything = 1.1T ÷ 5 pano/s ÷ 1400 GPU = 4.5 years — untenable. So:
- Prioritize by recency × view-count.
- Re-blur on read (lazy): if blur_model_ver < threshold when tile requested → re-blur with new model, write tile, serve with +1s latency first-time. Subsequent requests cached.
- Only eagerly re-blur top-10% by traffic; tail lazy.

Failure modes of the blur stage.

Failure	Detection	Mitigation	Recovery
Model deploys with higher FN (regression)	Shadow-model online eval; sampled human audit; FN-rate SLO alert	Circuit breaker: rollback model_ver → previous in Spanner config; hot-reload in GPU workers within 10 min	Re-blur affected captures (those processed between bad-deploy-start and rollback) deterministically with rolled-back version
GPU worker OOM on giant pano (e.g., malformed > 100MB)	Worker crash → pod restart; retry exceeds threshold → quarantine	Size cap at 80MB pre-blur; quarantine oversize to offline queue for manual inspect	Manual review; likely corrupted capture
Blur produces all-black output (crop bug)	Output size anomaly detection (expected output size ≈ input size ± 5%)	Quarantine capture; alert	Fix code, re-run from `pipeline_state`
Detector misses novel adversarial content (mask, turban)	Human audit sample; user takedown reports	Escalate to detector team; add to training set	Lazy re-blur on next model ver; explicit takedown in interim
Raw bucket ACL misconfig (public read)	Continuous IAM policy lint CI/CD; access-log anomaly detection	Automatic IAM revert via org-policy guardrail	Purge any accessed blobs from CDN; audit access logs for exposure

Real systems named. Google Street View's known blur pipeline (public talks by Google Maps team reference 2013-2015 architecture); Mapillary's detector pipeline (open-source variant); YOLOv8/RetinaFace for face detection; OpenALPR for plates; Apple's privacy-first mobile mapping.

8 Failure Modes & Resilience (system-wide) #

Component	Failure	Detection	Blast radius	Mitigation	Recovery
Vehicle cellular	Lose signal in tunnel for 30 min	Upload agent reconnect timeout	1 vehicle, up to 24h WAL'd	24h local NVMe WAL; QUIC for fast reconnect	Drain WAL on reconnect with part-level idempotency
Regional ingest	Entire GCP region outage	Health probes; LB failover	All vehicles in that region, up to 24h	DNS steering (Cloud Load Balancing global anycast) routes to next-nearest region; cross-region failover pre-warmed	Vehicles retry with alt-region upload_id; on primary restore, reconcile via `aggregate_sha256` idempotency
Spanner primary	Zone failure	Auto-failover (built-in)	Metadata writes pause ~30s	Spanner multi-region; no custom action	Writes resume; no data loss
GCS raw bucket	Data corruption	Per-part SHA256; background integrity scan; erasure-coding parity check	Up to the corrupted objects	Reed-Solomon (9,4) auto-repairs bit rot; multi-region redundancy	Recompute from parity; if total loss, alert + request recapture if recent
Pub/Sub backlog	Processing backs up > 30 min	Backlog depth metric SLO	Freshness SLA breach; tiles stale	Dataflow autoscaling; replay via topic retention (7d)	Drain with priority lanes (fresh before backlog)
Dedupe stage produces wrong "best" pick	User-visible quality regression	Sampling dashboards; user feedback	Cell-level quality	Re-run dedupe with improved score fn; mark old pick demoted	Roll forward
Blur model regression	FN-rate SLI alert	Shadow eval + human audit	Any pano processed between deploy & rollback	Auto-rollback model_ver; re-blur affected batch	Re-blur with good model; purge CDN
Corrupt image poisoning pipeline	Detector OOM / exception	Worker crash loop detection	Single capture quarantined; ~1s pipeline stall	Resource limits; oversize / malformed quarantine	Quarantine + re-inspect
S2 hot cell (e.g., Times Square blows past shard limit)	Bigtable hot-tablet alert; p99 latency jump	Single cell (~150m area)	Sub-cell hash-salt row key (16-way) promoted from hot-cell list	Online; no downtime needed because Bigtable auto-splits
Replay during re-processing stampede	Spanner write QPS spike	Rate-limit saturation	Pipeline slows; ingest unaffected (ingest uses own Spanner cluster)	Hard rate-limit re-proc to 10% of steady capacity; priority queues	Rate limiter controls restore in minutes
Tile CDN purge storm (mass takedown)	CDN purge rate spike	Global CDN ops visibility	Temporary p99 degradation on cold tiles	Purge-by-prefix instead of per-key; staged over 4h for large batches	Complete within takedown SLA (24h)
Attestation service outage	Cert validation fails	Health checks	Vehicles can't commit new uploads but can buffer	24h local WAL absorbs outage; attestation fails open for already-in-progress sessions with time-bounded token	Restore service; cached certs validate on-vehicle
Vehicle cert leak / clone	Upload anomaly detection (same cert, two locations)	Anomaly detector	Up to one vehicle's worth of fake data	Auto-revoke cert; force re-attestation; captures post-revoke rejected	Forensics on suspect captures; purge if needed

Paging philosophy. SRE pager carries SLO-breach alerts:

P0 — blur SLO breach (any unblurred-served-to-public event; any ACL misconfig); serve-path availability < 99.99%.
P1 — ingest SLO breach (region-wide); pipeline freshness > 24h; FN-rate > threshold.
P2 — per-vehicle anomaly; hot-cell split delay.
Runbooks are keyed to each alert; most have kubectl rollout undo or gcloud config config:update as the first step.

9 Evolution Path #

v1 — "Single region, batch, prove it works" (first 6 months)

One GCP region (us-central1). All vehicles upload here regardless of location (OK, fleet starts in SF).
Upload service on Cloud Run; GCS regional bucket; Spanner single-region.
Processing: Cloud Scheduler → Cloud Run batch jobs, runs every 15 min over new captures.
Blur: CPU inference (not GPU) because volume is ~100K/day total.
Serving: no CDN; direct origin at small scale. Pano viewer is internal-only demo.
Scope: 10 vehicles, one city.
Success criteria: e2e ingest → blur → viewable in <1h P95; durability check passes.

v2 — "Multi-region ingest, streaming, CDN serving" (months 6-18)

Regional ingest in NA-East/West, EU, JP. Spanner multi-region. Dual-region GCS for raw.
Dataflow streaming pipeline replaces batch: Pub/Sub → PTransforms → exactly-once writes.
Blur migrates to GPU fleet (T4) as volume crosses ~5M/day.
Tile origin + Google Cloud CDN; first public launch for Street View end-users in tier-1 metros.
Dedupe: LSH-based, Bigtable-backed.
SLI/SLO + auto-rollback for blur model.
Scope: 1000 vehicles, 20 cities.
Success criteria: 99.99% serve availability; FN-rate < 0.1%; ingest P95 < 6h.

v3 — "Planet-scale, priority lanes, continuous reprocessing" (months 18+)

10 regional ingest endpoints; edge PoPs for upload termination where latency matters.
Fresh-imagery priority lane: construction zones, disaster response, new road openings have their own Pub/Sub subscription with dedicated pipeline capacity.
On-demand re-blur scheduler: new model deploys, top-10% traffic re-blurred eagerly, tail lazy.
Spanner hierarchical sharding by S2 L10; hot-cell sub-sharding automatic.
Tiered storage fully online: 30d hot → 90d warm → cold archive with object lifecycle.
BigQuery federated export to ML training feature store.
Device attestation upgraded to TPM 2.0 with per-capture signed manifest in protocol.
Takedown workflow fully automated with 24h CDN-purge SLA.
Cost optimization: preemptible GPU burst for blur; committed-use discounts for sustained.
Scope: 10K vehicles, global.

v4 candidates (speculative). On-device blur with verifiable attestation (cuts ingest bandwidth 3-5×); federated learning for detector improvement from declined-for-upload examples; differential privacy for statistical aggregate release.

10 Out-of-1-Hour Notes #

10.1 SLAM / photogrammetry

Full depth-map reconstruction per pano uses neighboring captures (temporal stereo) + IMU priors.
Output: per-pixel depth + per-pano pose refinement (GPS is only accurate to ~3m; SLAM refines to sub-meter relative pose between neighboring panos).
Used downstream for Map Generation (building facade extraction, road geometry).
Compute cost: ~10× blur cost; typically run at lower priority on cold data (weeks after capture) rather than real-time.
Real systems: COLMAP, ORB-SLAM3, Google's proprietary photogrammetry stack.

10.2 Privacy regulations by jurisdiction

EU (GDPR): blur mandatory before publish. Right to erasure: user can request pano removed from a specific address within 30 days.
Germany: 2010 set precedent — visible houses opt-out; full "Verpixelungsrecht". Parts of Germany still uncovered. Any future German expansion requires house-level opt-out UI and pre-publish honoring.
California (CCPA): similar to GDPR for personal information. Faces = personal info.
Japan: looser on plates, stricter on residential detail.
India (DPDP 2023): emerging; face blur + explicit consent for sensitive locations (temples, defense).
Architecture implication: per-region takedown_policy config; blur + takedown are pluggable per-country; separate worker queues for jurisdictions with different rules to guarantee policy isolation.

10.3 Takedown workflow (right to be forgotten on panos)

User-facing web form → captures (lat, lng, radius, reason).
Operator reviews; if approved:
- S2 → list of affected captures.
- Mark captures takedown_status = active.
- Purge CDN by row-key prefix (tile rows).
- Overwrite blurred blob with takedown-placeholder (keeping raw in cold for legal record but making it inaccessible via serve path).
- Propagate to downstream ML exports — BigQuery has a takedown_status column; training jobs MUST filter.
Auditable: every takedown logged with ticket ID + operator ID + affected capture_ids; 7-year retention.

10.4 Vehicle fleet security

Device attestation: TPM 2.0 generates boot quote → attestation service verifies measured boot + firmware signature. Fails → cert not issued → vehicle can't upload.
Per-capture signing: vehicle signs manifest hash with TPM-resident key; server verifies signature chain up to fleet CA. Detects: cert clone, man-in-the-middle, spoofed GPS.
Anti-tamper physical: tamper-evident seals on camera rig; any physical tamper → TPM attestation fails on next boot.
Rotation: certs rotate every 90 days; any anomaly forces immediate re-attestation.
Supply-chain: firmware signed by cross-signed Google + vendor keys; verified at boot.

10.5 Cost model

Hot storage: GCS Standard ~$0.020/GB/mo × 550 PB = $11M/mo.
Warm: GCS Nearline ~$0.010/GB/mo × 1.65 EB = $17M/mo.
Cold (10y archive): GCS Archive ~$0.004/GB/mo × 25 EB = $100M/mo year-10 steady.
Blur compute: ~$9K/day × 365 = $3.3M/yr.
Stitch + SLAM compute: ~$30K/day × 365 = $11M/yr.
CDN egress: 1.5 TB/s peak × ... back-of-envelope ~$0.08/GB egress × 30% of traffic billable after caching = O($100M/yr egress). (This is actually the biggest variable cost; peering agreements help.)
Ingest bandwidth: 174 GB/s sustained × O($0.01/GB) ingress (often free on GCP within-region) = negligible vs egress.
Spanner: 10K processing units × $0.90/node/hr × 24 × 365 ≈ ~$80M/yr (and this is a concern; we'd push some workloads to Bigtable).
Rule of thumb: Street-View-scale imagery is an O($1B-2B/yr infrastructure line item at 10K vehicle fleet scale, before vehicle capex. This is what justifies aggressive dedupe + lifecycle tiering + deterministic re-blur (so model upgrades don't 10× the blur cost).

10.6 Observability

SLIs (customer-facing):

serve.availability = 2xx / total — target 99.99% 30d.
serve.latency_p99 per region — target <300ms.
blur.fn_rate (from audit sampling) — target <0.1%.
ingest.commit_p95 — target <6h.

SLIs (internal):

pipeline.freshness_per_stage_p95 — target <2h blur, <4h tile.
takedown.propagation_p95 — target <24h.
hot_cell.p99_latency — red-flag when a single cell's p99 spikes.
per_vehicle.upload_success_rate — per vehicle SLI, catches bad camera rigs.
per_model.blur_fn_rate — per deployed model ver, pre-prod + prod.

Alerting:

Multi-burn-rate error budget alerts (Google SRE style).
Blur FN rate has a fast-burn + auto-rollback integration: >3× baseline FN for 5 min → automated model rollback (human-in-loop notification but action is automatic because the privacy risk is too high to wait for paging).
Hot-cell alert: any L14 cell exceeding 100× median QPS → automatic sub-cell hash-salt promotion.

Dashboards:

Global ingest map (volume by region, freshness by region).
Per-stage pipeline latency heatmap.
Cost-per-capture breakdown (storage/compute/egress).
Blur audit dashboard with sampled FN examples for human reviewer triage.

Self-verification (before I submit) #

SRE pager-carryable? Yes — every failure mode has a detection signal, a mitigation, and a recovery path. Runbooks are keyed to alert names. Blur FN-rate is the only fully-automated rollback; everything else is human-in-loop.
Every diagram arrow → API/data flow? Yes. Upload arrow → InitiateUpload/PutPart/Commit (section 4). Pub/Sub fanout → event schemas (section 4). Serving arrows → GET /v1/pano (section 4). Metadata lookup → Spanner row schemas (section 5).
L7 vs L6 depth on deep-dives?
- Deep-dive 7.1 (upload): L7 — covers part-size billing asymmetry, DWPD on vehicle NVMe, jitter on Retry-After; L6 would stop at "use multipart."
- Deep-dive 7.2 (geo-index): L7 — quantifies S2 vs H3 vs geohash with Hilbert-locality argument, hot-cell sub-shard mitigation; L6 would say "use S2."
- Deep-dive 7.3 (blur): L7 — deterministic blur seed for re-blur idempotency, ACL belt-and-suspenders invariant, lazy vs eager re-blur economics; L6 would say "run a face detector."
Volume reconciliation: 300M panos/day × 50MB = 15 PB/day raw → 30d hot = 450 PB → 10 regional GCS buckets × ~45 PB each (feasible); processed 3.3 PB/day → 30d hot = 100 PB → CDN egress math consistent with 1M users × 30 tiles/sess × 50 KB = 1.5 TB/s peak → matches Google Global Cache scale. Metadata 300M rows/day × 365 × 10yr = 1.1T rows → Spanner feasible with L10 sharding. All numbers reconcile.