{"entries":[{"change":"PROMOTED. The AS-topology GraphSAGE GNN was re-evaluated with a genuine CensoredPlanet signal_value censorship label (97 ASNs: 62 that block measurement probes vs 35 that pass them) and a leakage-audited density+topology feature set (7 signal-derived features dropped). Leave-one-AS-out and leakage-safe leave-one-COUNTRY-out CV with a 5,000-permutation test give AUC 0.7645 / 0.7751, both at p=0.0002. The earlier headline (AUC 0.80, n=6 tier-1 ASNs, p=0.32) was an underpowered CV on a label that had collapsed to a measurement-density flag. passed_promote_floor flipped false -> true.","honest_caveats":["AUC 0.775 is clearly-better-than-chance, not operationally decisive: recall at threshold 0.5 is 0.68 under country-out CV, so the GNN still misses about a third of censoring ASNs.","The genuine label is cross-sectional (whether an ASN censors), not a dated 7-day forecast (when). The /v1/forecast/asn-gnn endpoint name predates this re-evaluation.","Labels remain CensoredPlanet-only and sparse (97 ASNs across 30 countries). The live per-ASN score still comes from the older 13-feature artifact; treat per-ASN scores as indicative and the reeval_v2 block in /v1/forecast/asn-gnn/info as the audited claim."],"kind":"forecast","metrics":{"leave_one_as_out_auc":0.7645,"leave_one_as_out_perm_p":0.0002,"leave_one_country_out_auc":0.7751,"leave_one_country_out_perm_p":0.0002,"n_country_folds":30,"n_labeled":97,"n_neg":35,"n_permutations":5000,"n_pos":62,"passed_promote_floor":true,"promote_floor_auc":0.65},"model":"forecast-asn-gnn-v1-reeval","trained_at":"2026-05-22T12:51:42.726607Z","version":"v1 (re-evaluated)"},{"finding_url":"/atlas/findings/prediction-track-record-2026-05","honest_caveats":["forecast_7day operational threshold is 0.05 — metrics_at_0.5/0.25 included for comparability","Other 11 models n_predictions=0 because per-request prod logging is not yet wired","Calibration drift +56pp indicates model massively under-predicts (5% mean vs 61% empirical)"],"kind":"transparency","live_endpoint":"/v1/atlas/prediction-track-record","metrics":{"forecast_7day_brier":0.556,"forecast_7day_calibration_drift_pp":56.45,"forecast_7day_n_predictions":720,"forecast_7day_precision_op":0.689,"forecast_7day_recall_op":0.391,"n_models_tracked":12,"n_models_with_prod_logging":1},"model":"prediction-track-record-v1","notes":"Live 30-day production track record across all forecast models. Joins daily-logged sentinel_forecasts against sentinel_outcomes. Only forecast_7day has prod logging today; other 11 models report training-time metrics + caveat until per-request logging lands. Daily cron 04:00 UTC.","trained_at":"2026-05-21T14:45:05Z","version":"v1","change":"prediction-track-record-v1 · v1 · transparency (forecast_7day_brier 0.556, forecast_7day_calibration_drift_pp 56.45)"},{"arxiv":"https://arxiv.org/abs/2203.05794","kind":"topic-modeling","live_endpoint":"/v1/atlas/topics, /v1/atlas/topics/info, /v1/atlas/topics/{cc}, /v1/atlas/incidents/{id}/topic","metrics":{"coherence_floor":0.3,"coherence_npmi":0.7257,"dedupe_rate":0.547,"n_docs":1195,"n_source_rows":2636,"n_topics":8,"passed_promote_floor":true,"reconstruction_err":19.81,"vocab_size":623},"model":"incident-topic-tfidf-nmf","notes":"tf-idf + NMF over 1,195 deduped incident descriptions (55% boilerplate dedupe). 8 themes auto-discovered; coherence 0.73 — well above 0.4 promote floor. BERTopic+sentence-transformers was the first pick but not installable on Vultr venv-ml, fell back per directive. Honest caveat: topic labels are heuristic over top words, not editorial.","paper":"Lee & Seung 1999 — Learning the parts of objects by NMF; Grootendorst 2022 (BERTopic, fallback target)","trained_at":"2026-05-21T13:02:50.926710Z","version":"v1","change":"incident-topic-tfidf-nmf · v1 · topic-modeling (coherence_floor 0.3, coherence_npmi 0.7257)"},{"arxiv":"https://arxiv.org/abs/2106.00170","kind":"calibration","live_endpoint":"/v1/forecast/{cc}/7day (aci_alpha field)","metrics":{"alpha_current":0.21,"empirical_coverage":0.913,"learning_rate":0.01,"n_observations":840,"target_alpha":0.1},"model":"forecast-aci-online-calibration","notes":"Online conformal update — kills calibration drift. State persists across requests in /opt/voidly-ai/ml-deploy/forecast_aci_state.json. Daily cron at 03:45 UTC.","paper":"Gibbs & Candès 2021 — Adaptive Conformal Inference Under Distribution Shift","trained_at":"2026-05-21T04:30:38Z","version":"v1","change":"forecast-aci-online-calibration · v1 · calibration (alpha_current 0.21, empirical_coverage 0.913)"},{"honest_caveats":["ROC AUC 0.954 came from a RANDOM stratified split (near-duplicate consecutive country-days inflate it); NO target leakage — features use [date-7,date], label [date+1,date+7].","Honest temporal-holdout AUC ~0.85-0.90. Confidence is in-sample, not calibrated per-forecast.","Live forward accountability: /v1/sentinel/accuracy (rolling precision, currently degraded) + /v1/forecast/onset-skill. Use for cross-country ranking, not within-country day-timing."],"kind":"forecast","live_endpoint":"/v1/forecast/{cc}/7day","metrics":{"f1":0.6667,"optimal_threshold":0.27,"positive_rate":0.092,"precision":0.7255,"recall":0.6167,"roc_auc":0.9541,"samples":2148},"model":"forecast-v1-7day-sane-labels","notes":"Retrained after IODA disruption labels excluded from target_7day. Dual-gate accepted (legacy holdout +18pp, temporal holdout +58pp).","trained_at":"2026-05-21T04:20:38Z","version":"v1.1","change":"forecast-v1-7day-sane-labels · v1.1 · forecast (f1 0.6667, optimal_threshold 0.27)"},{"kind":"forecast","live_endpoint":"/v1/forecast/{cc}/multi-horizon","metrics":{"loco_auc_1d":0.91,"loco_auc_30d":0.84,"loco_auc_7d":0.88,"n_spotlight_countries":20},"model":"forecast-multi-horizon","notes":"Separate 1d/7d/30d models. Per-horizon SHAP + 90% conformal intervals + monotonicity consistency check.","trained_at":"2026-05-21T03:40:00Z","version":"v1","change":"forecast-multi-horizon · v1 · forecast (loco_auc_1d 0.91, loco_auc_30d 0.84)"},{"kind":"classifier","live_endpoint":"/v1/classifier/score/{cc}","metrics":{"loco_mean_f1":0.711,"loco_median_f1":0.87,"n_countries":131,"n_features":16,"n_positive":1116,"n_samples":4237,"stratified_f1":0.729},"model":"classifier-v3.3","notes":"GradientBoosting. Regime-similarity-weighted geographic contagion. 16 features (13 base + 3 contagion neighbors). EG recovered from v3.2 regression. Honest caveat: 16 MENA + post-Soviet countries regress 5-29pp due to sparse neighbor-pair overlap (not fixed by regime-cluster finetune v3.4, which was a negative result).","trained_at":"2026-05-21T02:58:00Z","version":"v3.3","change":"classifier-v3.3 · v3.3 · classifier (loco_mean_f1 0.711, loco_median_f1 0.87)"},{"kind":"classifier","live_endpoint":null,"metrics":{"loco_median_f1":0.833,"passed_promote_floor":false,"regression_countries_improved_of_16":1},"model":"classifier-v3.4-regime-cluster-finetune","notes":"NEGATIVE RESULT — archived to /opt/voidly-ai/models/experimental/. Stack head learned coefficients ignored cluster heads (base v3.3 coef +9.80, cluster coefs in [-0.83, +0.64]). Root cause of tail regression isn't model architecture; it's noise-bounded F1 in countries with 5-15 positive samples. Real fix is targeted labeling.","trained_at":"2026-05-21T04:18:46Z","version":"v3.4","change":"classifier-v3.4-regime-cluster-finetune · v3.4 · classifier (loco_median_f1 0.833, regression_countries_improved_of_16 1)"},{"kind":"anomaly","live_endpoint":"/v1/anomaly/dbscan/{cc}","metrics":{"auc":0.6506,"min_samples":3,"n_scored":3215,"passed":true,"promote_floor_auc":0.65,"window_days":45},"model":"anomaly-dbscan-v1","notes":"Per-country rolling 45-day DBSCAN over 12 standardized OONI features. Promoted as SECOND-OPINION signal — supervised v3.3 still wins at 0.99, DBSCAN surfaces shape-anomalous days labels never saw.","paper":"Aceto & Pescape 2025 — CenDTect","trained_at":"2026-05-21T04:19:40Z","version":"v1","change":"anomaly-dbscan-v1 · v1 · anomaly (auc 0.6506, min_samples 3)"},{"kind":"anomaly","live_endpoint":"/v1/anomaly/domain-drift/leaderboard","metrics":{"n_clusters_last_week":0,"n_clusters_this_week":2,"n_domains_clustered":27,"top_drift_domain":"tiktok.com","top_drift_score":0.343},"model":"anomaly-domain-drift-hdbscan-v1","notes":"Per-domain HDBSCAN weekly drift. Orthogonal to per-country DBSCAN. Honest caveat: only 27 domains pass the min-10-measurement filter; week-over-week cluster stability needs months more data.","trained_at":"2026-05-21T04:43:38Z","version":"v1","change":"anomaly-domain-drift-hdbscan-v1 · v1 · anomaly (n_clusters_last_week 0, n_clusters_this_week 2)"},{"honest_caveats":["Held-out AUC/F1 of 1.0 + zero-error confusion matrix are INFLATED BY LABEL LEAKAGE (ML_LEAKAGE_AUDIT.md), not out-of-distribution skill.","Three features (country_7d_rate, asn_7d_rate, country_domain_7d_rate) are running rates computed from the per-row label; 'source' near-perfectly partitions it (IODA 0/4358 pos, Voidly-Community 105/105 pos); evaluation used a plain random split.","Honest group-aware estimate ~0.75-0.85. Use as a suspicious-evidence RANKER, not a novel censorship detector. Surfaced live at /v1/measurement/info."],"kind":"classifier","live_endpoint":"POST /v1/measurement/classify","metrics":{"f1":1,"n_test":17094,"precision":1,"recall":1,"roc_auc":1},"model":"measurement-classifier-v1","notes":"XGBoost row-level classifier. AUC=1.0 honest caveat — model reconstructs the labeling rule from signal_value + source patterns rather than discovering novel signal. Per-row interface layer on the same evidence as v3.3 country-day. Top feature: asn_7d_rate (81% gain).","paper":"Niaki et al. KDD23 — Massively Parallel Censorship Probing","trained_at":"2026-05-21T04:00:00Z","version":"v1","change":"measurement-classifier-v1 · v1 · classifier (f1 1, n_test 17094)"},{"arxiv":"https://arxiv.org/abs/1812.09970","kind":"attribution","live_endpoint":"/v1/sentinel/attribute?country=X&date=Y","metrics":{"donor_pool":"stable-democracies","method":"synthetic-difference-in-differences"},"model":"sentinel-attribute-sdid","notes":"Causal attribution for shutdown events. Builds counterfactual from stable-democracy donors, measures post-period gap, runs permutation p-value, surfaces nearby political events.","paper":"Arkhangelsky et al. 2018 — Synthetic Difference-in-Differences","trained_at":"2026-05-21T00:00:00Z","version":"v1","change":"sentinel-attribute-sdid · v1 · attribution"},{"kind":"score","live_endpoint":"/v1/atlas/score-v2","metrics":{"base_rate_weight":0.5,"moves_cn":33,"moves_kp":32,"moves_ru":24},"model":"atlas-score-v2-base-rate-weighted","notes":"A-F country grades. 50% base-rate weight (v1 only weighted change). Fixes stable-but-blocked CN/RU/KP scoring B-.","trained_at":"2026-05-21T03:00:00Z","version":"v2","change":"atlas-score-v2-base-rate-weighted · v2 · score (base_rate_weight 0.5, moves_cn 33)"},{"honest_caveats":["Test AUC 0.916 is base-rate-dominated and STRUCTURALLY LEAKED (ML_LEAKAGE_AUDIT.md): the *_present features and the is_censorship label are both downstream of the same anomalous evidence (incidents are minted from it). Tell-tale: f1_at_0.5 = 0.0.","Honest AUC ~0.55-0.65. The per-source likelihood-ratio breakdown is useful as 'source-agreement structure', NOT as predictive accuracy.","An honest metric needs a label independent of the evidence panel (human-verified incidents / news)."],"kind":"classifier","live_endpoint":"/v1/classifier/corroborate/info","metrics":{"f1_at_0.5":0,"honest_loco_median_auc":0.8658,"leakage_free_nextday_auc":0.7348,"lift_bootstrap_95ci_low":0.0006,"model_lift_over_trivial_baseline":0.016,"promoted":false,"reported_headline_auc":0.92,"reproduced_temporal_auc":0.9157,"trivial_baseline_auc_censoredplanet_only":0.8997},"model":"corroboration-v1-bayesian-reeval","notes":"LEAKAGE AUDIT (honest negative). The split is forward-temporal (not shuffled), so the leakage is NOT autocorrelation across folds — it is CIRCULAR FEATURES. is_censorship is derived from the incidents table; 343 of 344 censorship/mixed incidents were minted from an anomalous evidence row on the same country-day the features count, and 265/343 are sourced from censoredplanet only. A single raw feature (censoredplanet_present) scores AUC 0.90 — the 4-source Naive Bayes adds only +1.6pp (bootstrap CI nearly touches 0). On a leakage-free next-day target AUC falls to 0.73. F1 at 0.5 is 0.0 (ranker, not classifier). NOT promoted; no model/endpoint change. honest_caveats + leakage_audit block written to corroboration_v1_metrics.json by scripts/audit-bayesian-corroboration.py. The auto-incident-watchdog corroboration gate is a conservative near-veto, not independent confirmation.","trained_at":"2026-05-22T13:00:00Z","version":"v1 (leakage-audited)","change":"corroboration-v1-bayesian-reeval · v1 (leakage-audited) · classifier (f1_at_0.5 0, honest_loco_median_auc 0.8658)"}],"generated_at":"2026-05-22T12:59:08.659815Z","schema":"voidly-model-changelog/v1","source":"curated","_voidly_pay":{"claim_did":"https://voidly.ai/pay/claim","for_builders":"https://voidly.ai/pay/for-builders","marketplace":"https://api.voidly.ai/v1/pay/marketplace","live_demo":"https://huggingface.co/spaces/emperor-mew/voidly-pay","universal_proxy":{"url":"https://api.voidly.ai/v1/pay/proxy","example":"https://api.voidly.ai/v1/pay/proxy?u=<https-url>&to=did:voidly:<your-did>&price=0.01","page":"https://voidly.ai/pay/proxy","note":"Paywall any public HTTPS URL with one query param. No SDK install required."},"install":{"typescript":"npm install @voidly/pay","python":"pip install voidly-pay","langchain":"pip install voidly-pay-langchain","vercel_ai":"npm install @voidly/pay-vercel-ai","mcp":"npx @voidly/pay-mcp","cli":"npm install -g @voidly/pay-cli"},"note":"Voidly Pay is the open agent-to-agent payment rail. Claim a DID + 10 starter credits in 60s, or paywall any URL via the universal proxy with one query param."}}