{"attacks":[{"description":"Unperturbed feature vector — the model's home turf.","name":"baseline"},{"description":"anomaly_rate * 0.5: regime blocks 50% fewer users.","name":"half_anomaly"},{"description":"measurement_count * 2: drown signal in extra clean traffic.","name":"noise_x2"},{"description":"probe_block_rate * 0.5: selectively unblock during probe windows.","name":"half_probe_rate"},{"description":"spike_magnitude * 0.3: ramp up censorship instead of step change.","name":"smoothed_spike"},{"description":"neighbor contagion features zeroed: regime acts alone.","name":"isolated_regime"},{"description":"All five perturbations applied at once — worst-case stealth attack.","name":"all_combined"}],"baseline_detection_rate":0.93,"conditional_on_baseline_detected":{"all_combined":{"mean_proba_delta":0.046,"median_proba_delta":0.0102,"n_originally_detected":186,"n_still_detected":179,"retention_rate":0.9624},"half_anomaly":{"mean_proba_delta":0.0034,"median_proba_delta":0,"n_originally_detected":186,"n_still_detected":175,"retention_rate":0.9409},"half_probe_rate":{"mean_proba_delta":0,"median_proba_delta":0,"n_originally_detected":186,"n_still_detected":186,"retention_rate":1},"isolated_regime":{"mean_proba_delta":-0.0101,"median_proba_delta":0,"n_originally_detected":186,"n_still_detected":177,"retention_rate":0.9516},"noise_x2":{"mean_proba_delta":-0.027,"median_proba_delta":0.0013,"n_originally_detected":186,"n_still_detected":168,"retention_rate":0.9032},"smoothed_spike":{"mean_proba_delta":-0.0281,"median_proba_delta":0,"n_originally_detected":186,"n_still_detected":177,"retention_rate":0.9516}},"decision_threshold":0.5,"evaluated_at":"2026-05-21T14:25:16.586571Z","honest_caveats":["Detection rate = predicted class 1 at threshold 0.5. Probability shifts are also reported so callers can re-threshold.","Perturbations are applied to the country-day FEATURE VECTOR. They model what a regime can falsify at the measurement-aggregation level; they don't model adversarial probe collusion.","We use the production v3.3 model unchanged. No defensive retraining, no adversarial fine-tuning, no smoothing layer. This is the worst-case headline.","The 'isolated_regime' attack already happens naturally for ~80 countries (those with no positively-correlated neighbors). So that detection drop is partially a measurement of feature coverage, not just adversarial weakness.","F1, AUC, calibration metrics are NOT in this sidecar — only detection-rate on positives. False-positive rate is unaffected because we don't perturb negatives.","Sample size is 200 positives, sampled stratified by country. Single-attack detection-rate 95% CI is roughly ±7 percentage points by binomial approximation."],"interpretation":"On unperturbed positives v3.3 detects 93% (baseline). The most damaging single attack is 'noise_x2', which drops detection to 88% — i.e. evasion succeeds on 5pp more incidents. Combined attack (every perturbation at once) lands at 95% — a -2pp drop from baseline. A regime willing to apply all five tactics can evade detection on 5% of attempts.","model_trained_at":"2026-05-21T03:01:46.793987+00:00","model_version":"v3.3","most_fragile_attack":"noise_x2","n_original_positives":200,"per_attack_detection_rate":{"all_combined":0.95,"half_anomaly":0.9,"half_probe_rate":0.93,"isolated_regime":0.905,"noise_x2":0.88,"smoothed_spike":0.885},"per_attack_full":{"all_combined":{"detection_rate":0.95,"mean_proba":0.9242,"median_proba":0.9817,"n":200,"n_detected":190},"baseline":{"detection_rate":0.93,"mean_proba":0.8511,"median_proba":0.9502,"n":200,"n_detected":186},"half_anomaly":{"detection_rate":0.9,"mean_proba":0.8634,"median_proba":0.9695,"n":200,"n_detected":180},"half_probe_rate":{"detection_rate":0.93,"mean_proba":0.8511,"median_proba":0.9502,"n":200,"n_detected":186},"isolated_regime":{"detection_rate":0.905,"mean_proba":0.8475,"median_proba":0.9705,"n":200,"n_detected":181},"noise_x2":{"detection_rate":0.88,"mean_proba":0.8413,"median_proba":0.9696,"n":200,"n_detected":176},"smoothed_spike":{"detection_rate":0.885,"mean_proba":0.8218,"median_proba":0.9486,"n":200,"n_detected":177}},"reproducibility":{"evaluation_script":"scripts/evaluate-classifier-robustness.py","inputs_file":"/opt/voidly-ai/ml-deploy/classifier_v3.3_adversarial_inputs.json","model_file":"/opt/voidly-ai/models/censorship_classifier_v3_promoted.pkl","perturbation_script":"scripts/build-adversarial-perturbations.py"},"schema":"voidly-classifier-robustness/v1","_voidly_pay":{"claim_did":"https://voidly.ai/pay/claim","for_builders":"https://voidly.ai/pay/for-builders","marketplace":"https://api.voidly.ai/v1/pay/marketplace","live_demo":"https://huggingface.co/spaces/emperor-mew/voidly-pay","universal_proxy":{"url":"https://api.voidly.ai/v1/pay/proxy","example":"https://api.voidly.ai/v1/pay/proxy?u=<https-url>&to=did:voidly:<your-did>&price=0.01","page":"https://voidly.ai/pay/proxy","note":"Paywall any public HTTPS URL with one query param. No SDK install required."},"install":{"typescript":"npm install @voidly/pay","python":"pip install voidly-pay","langchain":"pip install voidly-pay-langchain","vercel_ai":"npm install @voidly/pay-vercel-ai","mcp":"npx @voidly/pay-mcp","cli":"npm install -g @voidly/pay-cli"},"note":"Voidly Pay is the open agent-to-agent payment rail. Claim a DID + 10 starter credits in 60s, or paywall any URL via the universal proxy with one query param."}}