Pong RAM 策略实验

轮次: 49 · 局数: 1773 · 失败记录: 37 · 最新轮: 49

全部轮次

轮次时间策略seed局数均分最低最高步数改动建议
492026-05-10T00:05:23.369804+00:00ram_intercept_v3_28_chain2_score00_initial_up08019.02518.020.0466333Round 66 accepted marker-level chain2 score0-0 initial DOWN->UP guard after 80-seed counterfactual mean 19.025 min 18 max 20; expected seeds 2 and 15 improve 18->20 with no regressions.和上一轮对比,每次只改一个 heuristic。
482026-05-09T21:16:24.260567+00:00ram_intercept_v3_27_high_down_score160_bounce_noop08018.97518.020.0466965Round 55: solidify score 16-0 high-down y165 bounce DOWN->NOOP after v3_26 score18 floor trace and full 80-seed counterfactual guard; v3_26 mean 18.70 min 18 max 20 to probe mean 18.98 min 18 max 20和上一轮对比,每次只改一个 heuristic。
472026-05-09T20:56:51.415833+00:00ram_intercept_v3_26_low_up_score10_final_noop08018.718.020.0471695Round 54: solidify score 1-0 low-up final DOWN->NOOP after v3_25 floor trace and full 80-seed counterfactual guard; v3_25 mean 18.2875 min 17 max 19 to probe mean 18.70 min 18 max 20和上一轮对比,每次只改一个 heuristic。
462026-05-09T20:33:24.386269+00:00ram_intercept_v3_25_y86_score140_serve_initial_noop08018.287517.019.0713441Round 53: solidify score 14-0 y86 serve-initial DOWN->NOOP after setup diagnostic and full 80-seed counterfactual guard; v3_24 mean 18.0125 min 17 max 19 to probe mean 18.29 min 17 max 19和上一轮对比,每次只改一个 heuristic。
452026-05-09T20:11:08.785023+00:00ram_intercept_v3_24_score00_mid_down_y154_noop08018.012517.019.0712891Round 51: solidify score 0-0 mid-down y154 final UP->NOOP after mixed seed2/15 trace and full 80-seed counterfactual guard; v3_23 mean 17.9625 min 16 max 19 to probe mean 18.01 min 17 max 19和上一轮对比,每次只改一个 heuristic。
442026-05-09T19:53:06.444230+00:00ram_intercept_v3_23_high_down_score120_bounce_noop08017.962516.019.0713027Round 50: solidify score 12-0 high-down bounce x144-148 y50-56 dy9 DOWN->NOOP after score16 floor trace and full 80-seed counterfactual guard; v3_22 mean 17.6875 min 16 max 19 to probe mean 17.96 min 16 max 19和上一轮对比,每次只改一个 heuristic。
432026-05-09T19:33:24.229013+00:00ram_intercept_v3_22_y86_score100_serve_initial_noop08017.687516.019.0720397Round 49: solidify score 10-0 y86 serve-initial DOWN->NOOP after setup diagnostic and full 80-seed counterfactual guard; v3_21 mean 17.4125 min 15 max 19 to probe mean 17.69 min 16 max 19和上一轮对比,每次只改一个 heuristic。
422026-05-09T19:11:10.174793+00:00ram_intercept_v3_21_high_down_score80_bounce_noop08017.412515.019.0717207Round 47: solidify score 8-0 high-down bounce x144-148 y50-56 dy9 DOWN->NOOP after full 80-seed counterfactual guard; v3_20 mean 17.1375 min 14 max 19 to probe mean 17.41 min 15 max 19和上一轮对比,每次只改一个 heuristic。
412026-05-09T18:51:51.602778+00:00ram_intercept_v3_20_y86_score60_serve_initial_noop08017.137514.019.0727217Round 45: solidify score 6-0 y86 serve-initial DOWN->NOOP after full 80-seed counterfactual guard; v3_19 mean 16.8625 min 13 max 19 to probe mean 17.14 min 14 max 19和上一轮对比,每次只改一个 heuristic。
402026-05-09T17:46:03.441851+00:00ram_intercept_v3_19_video_check_seed11119.019.019.05262video check for current retained v3_19 policy on seed 1; avoids known seed 0/9 inactive-tail loop分数无波动;每次只改一个启发式,再多 seed 测。
392026-05-09T16:42:02.651397+00:00ram_intercept_v3_19_video_check0117.017.017.027000video check for current retained v3_19 policy; one recorded episode for report.html viewing分数无波动;每次只改一个启发式,再多 seed 测。
382026-05-09T15:18:29.657105+00:00ram_intercept_v3_19_y86_after_score22_bounce_up08016.862513.019.0721387Round 39: solidify stateful y86 x166 bounce DOWN->UP only after score 2-2 family marker; v3_18 mean 14.9125 min 13 max 19 to probe mean 16.86 min 13 max 19和上一轮对比,每次只改一个 heuristic。
372026-05-09T14:20:12.603639+00:00ram_intercept_v3_18_high_down_score40_bounce_noop08014.912513.019.0763117Round 37: solidify score 4-0 high-down bounce x144-148 y50-56 dy9 DOWN->NOOP after history diagnostic; v3_17 mean 14.6375 min 12 max 19 to probe mean 14.91 min 13 max 19和上一轮对比,每次只改一个 heuristic。
362026-05-09T13:37:32.626697+00:00ram_intercept_v3_17_y86_score82_early_setup_down08014.637512.019.0769757Round 34: solidify score 8-2 y86 offset8 UP->DOWN correction only during steps 2050..2110 after full guard; v3_16 mean 14.3125 min 12 max 19 to probe mean 14.64 min 12 max 19和上一轮对比,每次只改一个 heuristic。
352026-05-09T12:05:35.206021+00:00ram_intercept_v3_16_y86_score20_serve_initial_up08014.312512.019.0778545Round 27: solidify score 2-0 y86 serve-initial DOWN->UP correction after full guard; v3_15 mean 14.1625 min 11 max 19 to probe mean 14.31 min 12 max 19和上一轮对比,每次只改一个 heuristic。
342026-05-09T11:19:26.190173+00:00ram_intercept_v3_15_y86_score62_early_setup_down08014.162511.019.0773925Round 24: solidify step-gated y86 6-2 setup UP->DOWN correction after full guard; v3_14 mean 13.8375 min 11 max 19 to probe mean 14.16 min 11 max 19和上一轮对比,每次只改一个 heuristic。
332026-05-09T10:51:55.791188+00:00ram_intercept_v3_14_y86_score42_setup_down08013.837511.019.0782713Round 22: solidify one-shot y86 4-2 setup UP->DOWN correction after 80-seed counterfactual; v3_13 mean 13.5125 min 10 max 19 to probe mean 13.84 min 11 max 19和上一轮对比,每次只改一个 heuristic。
322026-05-09T10:35:06.749627+00:00ram_intercept_v3_13_y86_score22_setup_down08013.512510.019.0791501Round 21: solidify one-shot y86 2-2 setup UP->DOWN correction after 80-seed score-gated counterfactual; v3_12 mean 13.1875 min 9 max 19 to probe mean 13.51 min 10 max 19和上一轮对比,每次只改一个 heuristic。
312026-05-09T09:48:19.656719+00:00ram_intercept_v3_12_low_up_y111_down08013.18759.019.0800289Round 19: solidify low-up y111 precontact UP->DOWN correction after 80-seed counterfactual guard; v3_11 mean 12.75 min 9 max 19 to probe mean 13.19 min 9 max 19和上一轮对比,每次只改一个 heuristic。
302026-05-09T09:24:48.512264+00:00ram_intercept_v3_11_low_mid_score00_noop08012.759.019.0792374Round 18: solidify seed44/66 low-mid score00 DOWN->NOOP correction after 80-seed counterfactual guard和上一轮对比,每次只改一个 heuristic。
292026-05-09T09:02:56.491867+00:00ram_intercept_v3_10_low_up_score01_noop_seed60_79602013.8-1.019.0241136expanded validation for retained v3_10 policy over seeds 60..79 after observed seeds 0..59 mean about 11.8 min -1 max 19; look for new collapse before further tuning和上一轮对比,每次只改一个 heuristic。
282026-05-09T09:00:07.834203+00:00ram_intercept_v3_10_low_up_score01_noop_seed40_59402012.1-1.018.0247890expanded validation for retained v3_10 policy over seeds 40..59 after 0..39 mean 11.65 min -1 max 19; look for new collapse before further tuning和上一轮对比,每次只改一个 heuristic。
272026-05-09T08:56:35.179654+00:00ram_intercept_v3_10_low_up_score01_noop04011.65-1.019.0372732expanded-set low-up score01 correction: keep v3_9 and add one-shot 0-1 score dy=-8 x=176..178 y=172..176 paddle=132..138 target=135..150 baseline DOWN to NOOP; 40-seed counterfactual improved min -21 to -1 with only seed33 changed和上一轮对比,每次只改一个 heuristic。
262026-05-09T08:44:11.896065+00:00ram_intercept_v3_9_late_low_loop_noop_limited_seed20_3920209.4-21.019.0171431expanded validation for retained v3_9 policy over seeds 20..39; check for overfit or new collapse after 0..19 mean 12.9 min 9 max 19和上一轮对比,每次只改一个 heuristic。
252026-05-09T08:42:11.067290+00:00ram_intercept_v3_9_late_low_loop_noop_limited02012.99.019.0176564late low-loop correction capped to match validated probe: keep v3_8 and apply dy=3..4 x=169..177 y=49..57 paddle=80..105 target=60..75 UP-to-NOOP at most 64 times per episode; revalidate after uncapped formal run underperformed probe和上一轮对比,每次只改一个 heuristic。
242026-05-09T08:40:31.349485+00:00ram_intercept_v3_9_late_low_loop_noop02011.20.019.0176564late low-loop correction: keep v3_8 and change dy=3..4 x=169..177 y=49..57 paddle=80..105 target=60..75 baseline UP to NOOP; counterfactual 20-seed probe improved mean 9.4 to 12.9 and min -1 to 9, with max 20 to 19和上一轮对比,每次只改一个 heuristic。
232026-05-09T08:32:14.021723+00:00ram_intercept_v3_8_high_down_setup_noop0209.4-1.020.0219266high-down setup correction: keep v3_7 and change dy=8 x=121..153 y=103..167 paddle=155..195 target=165..185 baseline NOOP to DOWN; counterfactual 20-seed probe improved mean 4.6 to 9.4 with min -1 unchanged和上一轮对比,每次只改一个 heuristic。
222026-05-09T08:21:07.033577+00:00ram_intercept_v3_7_mid_up_precontact_noop0204.6-1.020.0202825mid-up y164 precontact phase correction: keep v3_6 and change only dy=-8 x=187..191 y=188..196 paddle=190..205 baseline UP to NOOP; counterfactual 20-seed probe improved mean 1.5 to 4.6 with min -1 unchanged和上一轮对比,每次只改一个 heuristic。
212026-05-09T08:08:29.390957+00:00ram_intercept_v3_6_mid_down_precontact_noop0201.5-1.09.0206374mid-down precontact phase correction: keep v3_5 and change only dy=4 x=186..194 y=68..82 paddle=65..90 baseline UP to NOOP; counterfactual 20-seed probe improved mean 1.2 to 1.5 with min -1 unchanged和上一轮对比,每次只改一个 heuristic。
202026-05-09T07:55:51.258328+00:00ram_intercept_v3_5_low_serve_late_noop0201.2-1.09.0235425low-serve late noop correction: keep v3_4 opening low-dy8 fix and change only late low-serve target=50 noop at x=189..201 y=55..67 pad=35..65 to UP; counterfactual 20-seed probe improved mean 0.2 to 1.2 and min -21 to -1和上一轮对比,每次只改一个 heuristic。
192026-05-09T07:41:30.825519+00:00ram_intercept_v3_4_opening_low_dy80100.9-1.09.0138373opening low dy=8 one-step correction: at 0-0 score, x=184..188 y=130..145 with paddle recovering upward, execute UP instead of baseline DOWN once; counterfactual probe improved 10-seed mean 0.5 to 0.9和上一轮对比,每次只改一个 heuristic。
182026-05-09T06:46:18.399391+00:00ram_intercept_v3_4_late_down_keep_down_probe0100.5-1.09.0137933temporary probe: if late downward ball x=188..204 y=70..140 dy>0 and previous action was down, keep down instead of noop/up; validate action-rhythm hypothesis against v3_3 baseline mean 0.5 min -1 max 9和上一轮对比,每次只改一个 heuristic。
172026-05-09T06:40:34.163055+00:00ram_intercept_v3_4_late_low_down_y86_probe0100.5-1.09.0138299temporary probe: for repeated x~202 y~86 dy=4 losses, add +8 target bias only at x=196..204 y=78..96 dy=3..5; validate against v3_3 baseline mean 0.5 min -1 max 9和上一轮对比,每次只改一个 heuristic。
162026-05-09T06:21:02.806717+00:00ram_intercept_v3_3_score_state_regression0100.5-1.09.0138299state extraction update only: PongState now exposes ram[14] self_score and ram[13] opponent_score discovered from reward-event probes; verify no policy behavior change over seeds 0-9和上一轮对比,每次只改一个 heuristic。
152026-05-09T06:19:02.799542+00:00ram_intercept_v3_4_late_lock_5seed05-5.8-21.0-1.070301hypothesis test: lock the last predicted intercept after ball passes SELF_PADDLE_X_RAM instead of chasing current y at the right edge; quick seeds 0-4 no-video max_steps=30000先用 RGB/OBS 标注失败模式,再改 heuristic。
142026-05-09T06:16:49.146780+00:00ram_intercept_v3_3_repro_baseline0100.5-1.09.0138299repro baseline before next change: current ram_intercept_v3_3 over seeds 0-9, no-video max_steps=30000先用 RGB/OBS 标注失败模式,再改 heuristic。
132026-05-09T05:57:01.222578+00:00ram_intercept_v3_3_10seed_30k_check0100.5-1.09.0138299no-video long robustness check for ram_intercept_v3_3 over seeds 0-9 with max_steps=30000; distinguishes true near-tie games from 10000-step truncation先用 RGB/OBS 标注失败模式,再改 heuristic。
122026-05-09T05:55:36.672979+00:00ram_intercept_v3_3_10seed_check0100.5-2.09.084144no-video robustness check for ram_intercept_v3_3 over seeds 0-9; scores expected around mean +0.5 with no -20 collapse先用 RGB/OBS 标注失败模式,再改 heuristic。
112026-05-09T05:51:13.476925+00:00ram_intercept_v3_3051.4-1.09.037646low-serve pattern rule: after detecting reset serve near x=126..138 y=116..128 moving up-right, aim at y=50 for 24 policy steps; no-video sweep improved seeds 0-4 mean from -1.4 to +1.4先用 RGB/OBS 标注失败模式,再改 heuristic。
102026-05-09T05:39:55.253489+00:00ram_intercept_v3_205-1.4-20.09.034843inactive serve-ready center changed from 121 to 100; no-video sweep over seeds 0-4 improved mean from -1.8 to -1.4 and kept target/intercept constants from v3_1先用 RGB/OBS 标注失败模式,再改 heuristic。
92026-05-09T05:26:56.672212+00:00ram_intercept_v3_105-1.8-20.09.023217target clamp 46..196 added on top of ram_intercept_v3; no-video sweep over seeds 0-4 improved mean from -2.0 to -1.8先用 RGB/OBS 标注失败模式,再改 heuristic。
82026-05-09T05:22:05.593598+00:00ram_intercept_v305-2.0-20.08.023252tuned intercept policy: runtime reflection bounds fixed; MAX_PLAY_Y_RAM=210 and PADDLE_DEADBAND=3 selected from no-video sweeps over seeds 0-4先用 RGB/OBS 标注失败模式,再改 heuristic。
72026-05-09T05:12:41.196821+00:00ram_intercept_v205-14.4-19.00.020904stateful velocity policy: estimate dx/dy from consecutive RAM states, predict right-side intercept with wall reflection; RGB overlay calibrated from visual probe和上一轮对比,每次只改一个 heuristic。
62026-05-09T05:11:03.604459+00:00minimal_ram_xgate_v1_calibrated_overlay_diag01-21.0-21.0-21.02145诊断样本:RGB 叠加层已按 visual_x=ram[49]-48.5、visual_y=ram[54]-12.5 校准,用于分析下一轮失败模式仍全输;看 RGB 失败片段,并复核 ball_y / paddle_y 映射。
52026-05-09T05:08:25.248804+00:00minimal_ram_xgate_v1_overlay_diag01-21.0-21.0-21.02145诊断样本:RGB 视频叠加 RAM 推断的球、挡板、动作、reward、score,用于检查状态检测和策略失败模式仍全输;看 RGB 失败片段,并复核 ball_y / paddle_y 映射。
42026-05-09T03:00:51.076953+00:00minimal_ram_xgate_v105-20.4-21.0-20.07486x-gated target: center paddle unless ball is active and in right half仍全输;看 RGB 失败片段,并复核 ball_y / paddle_y 映射。
32026-05-09T02:46:35.736446+00:00minimal_ram_v001-20.0-20.0-20.01343dual rgb and obs video recording仍全输;看 RGB 失败片段,并复核 ball_y / paddle_y 映射。
22026-05-09T02:38:15.472083+00:00minimal_ram_v003-20.0-20.0-20.03606minimal ram policy with per-episode gif recording仍全输;看 RGB 失败片段,并复核 ball_y / paddle_y 映射。
12026-05-09T02:33:35.856289+00:00minimal_ram_v003-20.0-20.0-20.03606minimal ram policy using ram[54] ball_y and ram[51] paddle_y仍全输;看 RGB 失败片段,并复核 ball_y / paddle_y 映射。

轮次详情

Round 49: ram_intercept_v3_28_chain2_score00_initial_up

2026-05-10T00:05:23.369804+00:00
均分 19.025
改动
Round 66 accepted marker-level chain2 score0-0 initial DOWN->UP guard after 80-seed counterfactual mean 19.025 min 18 max 20; expected seeds 2 and 15 improve 18->20 with no regressions.
样本
seed 0 / 80 局 / 466333 步
分数
18.0 / 19.025 / 20.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1020.05023True
2119.05262True
3220.05024True
4318.05458True
5418.05454True
6519.07573True
7618.05458True
8719.06904True
9819.07573True
10920.05019True
111019.05261True
121119.05257True
131219.06905True
141319.05544True
151419.06905True
161520.05024True
171619.05261True
181719.06904True
191818.05458True
201919.05256True
212019.07573True
222119.05261True
232219.05258True
242319.07569True
252418.05460True
262520.05025True
272619.05258True
282719.07569True
292819.05263True
302919.05258True
313019.05262True
323119.05544True
333219.07568True
343318.05455True
353419.05263True
363519.05261True
373619.05262True
383719.06904True
393819.05263True
403919.05261True
414018.05459True
424120.05026True
434218.05459True
444320.05025True
454418.05455True
464519.06905True
474619.05256True
484719.07572True
494820.05026True
504918.05458True
515020.05022True
525119.05257True
535219.05258True
545319.05257True
555419.06904True
565519.05260True
575619.05258True
585719.06904True
595819.07573True
605919.05259True
616019.05260True
626119.07573True
636219.05544True
646320.05023True
656420.05025True
666519.05545True
676618.05455True
686719.07573True
696819.05544True
706919.05543True
717019.05545True
727119.06905True
737220.05020True
747319.05543True
757419.06901True
767520.05020True
777619.05263True
787719.07568True
797819.05262True
807919.07568True

Round 48: ram_intercept_v3_27_high_down_score160_bounce_noop

2026-05-09T21:16:24.260567+00:00
均分 18.975
改动
Round 55: solidify score 16-0 high-down y165 bounce DOWN->NOOP after v3_26 score18 floor trace and full 80-seed counterfactual guard; v3_26 mean 18.70 min 18 max 20 to probe mean 18.98 min 18 max 20
样本
seed 0 / 80 局 / 466965 步
分数
18.0 / 18.975 / 20.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1020.05023True
2119.05262True
3218.05340True
4318.05458True
5418.05454True
6519.07573True
7618.05458True
8719.06904True
9819.07573True
10920.05019True
111019.05261True
121119.05257True
131219.06905True
141319.05544True
151419.06905True
161518.05340True
171619.05261True
181719.06904True
191818.05458True
201919.05256True
212019.07573True
222119.05261True
232219.05258True
242319.07569True
252418.05460True
262520.05025True
272619.05258True
282719.07569True
292819.05263True
302919.05258True
313019.05262True
323119.05544True
333219.07568True
343318.05455True
353419.05263True
363519.05261True
373619.05262True
383719.06904True
393819.05263True
403919.05261True
414018.05459True
424120.05026True
434218.05459True
444320.05025True
454418.05455True
464519.06905True
474619.05256True
484719.07572True
494820.05026True
504918.05458True
515020.05022True
525119.05257True
535219.05258True
545319.05257True
555419.06904True
565519.05260True
575619.05258True
585719.06904True
595819.07573True
605919.05259True
616019.05260True
626119.07573True
636219.05544True
646320.05023True
656420.05025True
666519.05545True
676618.05455True
686719.07573True
696819.05544True
706919.05543True
717019.05545True
727119.06905True
737220.05020True
747319.05543True
757419.06901True
767520.05020True
777619.05263True
787719.07568True
797819.05262True
807919.07568True

Round 47: ram_intercept_v3_26_low_up_score10_final_noop

2026-05-09T20:56:51.415833+00:00
均分 18.7
改动
Round 54: solidify score 1-0 low-up final DOWN->NOOP after v3_25 floor trace and full 80-seed counterfactual guard; v3_25 mean 18.2875 min 17 max 19 to probe mean 18.70 min 18 max 20
样本
seed 0 / 80 局 / 471695 步
分数
18.0 / 18.7 / 20.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1020.05023True
2119.05262True
3218.05340True
4318.05458True
5418.05454True
6518.07788True
7618.05458True
8718.07119True
9818.07788True
10920.05019True
111019.05261True
121119.05257True
131218.07120True
141319.05544True
151418.07120True
161518.05340True
171619.05261True
181718.07119True
191818.05458True
201919.05256True
212018.07788True
222119.05261True
232219.05258True
242318.07784True
252418.05460True
262520.05025True
272619.05258True
282718.07784True
292819.05263True
302919.05258True
313019.05262True
323119.05544True
333218.07783True
343318.05455True
353419.05263True
363519.05261True
373619.05262True
383718.07119True
393819.05263True
403919.05261True
414018.05459True
424120.05026True
434218.05459True
444320.05025True
454418.05455True
464518.07120True
474619.05256True
484718.07787True
494820.05026True
504918.05458True
515020.05022True
525119.05257True
535219.05258True
545319.05257True
555418.07119True
565519.05260True
575619.05258True
585718.07119True
595818.07788True
605919.05259True
616019.05260True
626118.07788True
636219.05544True
646320.05023True
656420.05025True
666519.05545True
676618.05455True
686718.07788True
696819.05544True
706919.05543True
717019.05545True
727118.07120True
737220.05020True
747319.05543True
757418.07116True
767520.05020True
777619.05263True
787718.07783True
797819.05262True
807918.07783True

Round 46: ram_intercept_v3_25_y86_score140_serve_initial_noop

2026-05-09T20:33:24.386269+00:00
均分 18.2875
改动
Round 53: solidify score 14-0 y86 serve-initial DOWN->NOOP after setup diagnostic and full 80-seed counterfactual guard; v3_24 mean 18.0125 min 17 max 19 to probe mean 18.29 min 17 max 19
样本
seed 0 / 80 局 / 713441 步
分数
17.0 / 18.2875 / 19.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1017.027000True
2119.05262True
3218.05340True
4318.05458True
5418.05454True
6518.07788True
7618.05458True
8718.07119True
9818.07788True
10917.027000True
111019.05261True
121119.05257True
131218.07120True
141319.05544True
151418.07120True
161518.05340True
171619.05261True
181718.07119True
191818.05458True
201919.05256True
212018.07788True
222119.05261True
232219.05258True
242318.07784True
252418.05460True
262517.027000True
272619.05258True
282718.07784True
292819.05263True
302919.05258True
313019.05262True
323119.05544True
333218.07783True
343318.05455True
353419.05263True
363519.05261True
373619.05262True
383718.07119True
393819.05263True
403919.05261True
414018.05459True
424117.027000True
434218.05459True
444317.027000True
454418.05455True
464518.07120True
474619.05256True
484718.07787True
494817.027000True
504918.05458True
515017.027000True
525119.05257True
535219.05258True
545319.05257True
555418.07119True
565519.05260True
575619.05258True
585718.07119True
595818.07788True
605919.05259True
616019.05260True
626118.07788True
636219.05544True
646317.027000True
656417.027000True
666519.05545True
676618.05455True
686718.07788True
696819.05544True
706919.05543True
717019.05545True
727118.07120True
737217.027000True
747319.05543True
757418.07116True
767517.027000True
777619.05263True
787718.07783True
797819.05262True
807918.07783True

Round 45: ram_intercept_v3_24_score00_mid_down_y154_noop

2026-05-09T20:11:08.785023+00:00
均分 18.0125
改动
Round 51: solidify score 0-0 mid-down y154 final UP->NOOP after mixed seed2/15 trace and full 80-seed counterfactual guard; v3_23 mean 17.9625 min 16 max 19 to probe mean 18.01 min 17 max 19
样本
seed 0 / 80 局 / 712891 步
分数
17.0 / 18.0125 / 19.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1017.027000True
2119.05262True
3218.05340True
4318.05458True
5418.05454True
6517.07763True
7618.05458True
8717.07094True
9817.07763True
10917.027000True
111019.05261True
121119.05257True
131217.07095True
141319.05544True
151417.07095True
161518.05340True
171619.05261True
181717.07094True
191818.05458True
201919.05256True
212017.07763True
222119.05261True
232219.05258True
242317.07759True
252418.05460True
262517.027000True
272619.05258True
282717.07759True
292819.05263True
302919.05258True
313019.05262True
323119.05544True
333217.07758True
343318.05455True
353419.05263True
363519.05261True
373619.05262True
383717.07094True
393819.05263True
403919.05261True
414018.05459True
424117.027000True
434218.05459True
444317.027000True
454418.05455True
464517.07095True
474619.05256True
484717.07762True
494817.027000True
504918.05458True
515017.027000True
525119.05257True
535219.05258True
545319.05257True
555417.07094True
565519.05260True
575619.05258True
585717.07094True
595817.07763True
605919.05259True
616019.05260True
626117.07763True
636219.05544True
646317.027000True
656417.027000True
666519.05545True
676618.05455True
686717.07763True
696819.05544True
706919.05543True
717019.05545True
727117.07095True
737217.027000True
747319.05543True
757417.07091True
767517.027000True
777619.05263True
787717.07758True
797819.05262True
807917.07758True

Round 44: ram_intercept_v3_23_high_down_score120_bounce_noop

2026-05-09T19:53:06.444230+00:00
均分 17.9625
改动
Round 50: solidify score 12-0 high-down bounce x144-148 y50-56 dy9 DOWN->NOOP after score16 floor trace and full 80-seed counterfactual guard; v3_22 mean 17.6875 min 16 max 19 to probe mean 17.96 min 16 max 19
样本
seed 0 / 80 局 / 713027 步
分数
16.0 / 17.9625 / 19.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1017.027000True
2119.05262True
3216.05408True
4318.05458True
5418.05454True
6517.07763True
7618.05458True
8717.07094True
9817.07763True
10917.027000True
111019.05261True
121119.05257True
131217.07095True
141319.05544True
151417.07095True
161516.05408True
171619.05261True
181717.07094True
191818.05458True
201919.05256True
212017.07763True
222119.05261True
232219.05258True
242317.07759True
252418.05460True
262517.027000True
272619.05258True
282717.07759True
292819.05263True
302919.05258True
313019.05262True
323119.05544True
333217.07758True
343318.05455True
353419.05263True
363519.05261True
373619.05262True
383717.07094True
393819.05263True
403919.05261True
414018.05459True
424117.027000True
434218.05459True
444317.027000True
454418.05455True
464517.07095True
474619.05256True
484717.07762True
494817.027000True
504918.05458True
515017.027000True
525119.05257True
535219.05258True
545319.05257True
555417.07094True
565519.05260True
575619.05258True
585717.07094True
595817.07763True
605919.05259True
616019.05260True
626117.07763True
636219.05544True
646317.027000True
656417.027000True
666519.05545True
676618.05455True
686717.07763True
696819.05544True
706919.05543True
717019.05545True
727117.07095True
737217.027000True
747319.05543True
757417.07091True
767517.027000True
777619.05263True
787717.07758True
797819.05262True
807917.07758True

Round 43: ram_intercept_v3_22_y86_score100_serve_initial_noop

2026-05-09T19:33:24.229013+00:00
均分 17.6875
改动
Round 49: solidify score 10-0 y86 serve-initial DOWN->NOOP after setup diagnostic and full 80-seed counterfactual guard; v3_21 mean 17.4125 min 15 max 19 to probe mean 17.69 min 16 max 19
样本
seed 0 / 80 局 / 720397 步
分数
16.0 / 17.6875 / 19.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1017.027000True
2119.05262True
3216.05408True
4318.05458True
5418.05454True
6516.08098True
7618.05458True
8716.07429True
9816.08098True
10917.027000True
111019.05261True
121119.05257True
131216.07430True
141319.05544True
151416.07430True
161516.05408True
171619.05261True
181716.07429True
191818.05458True
201919.05256True
212016.08098True
222119.05261True
232219.05258True
242316.08094True
252418.05460True
262517.027000True
272619.05258True
282716.08094True
292819.05263True
302919.05258True
313019.05262True
323119.05544True
333216.08093True
343318.05455True
353419.05263True
363519.05261True
373619.05262True
383716.07429True
393819.05263True
403919.05261True
414018.05459True
424117.027000True
434218.05459True
444317.027000True
454418.05455True
464516.07430True
474619.05256True
484716.08097True
494817.027000True
504918.05458True
515017.027000True
525119.05257True
535219.05258True
545319.05257True
555416.07429True
565519.05260True
575619.05258True
585716.07429True
595816.08098True
605919.05259True
616019.05260True
626116.08098True
636219.05544True
646317.027000True
656417.027000True
666519.05545True
676618.05455True
686716.08098True
696819.05544True
706919.05543True
717019.05545True
727116.07430True
737217.027000True
747319.05543True
757416.07426True
767517.027000True
777619.05263True
787716.08093True
797819.05262True
807916.08093True

Round 42: ram_intercept_v3_21_high_down_score80_bounce_noop

2026-05-09T19:11:10.174793+00:00
均分 17.4125
改动
Round 47: solidify score 8-0 high-down bounce x144-148 y50-56 dy9 DOWN->NOOP after full 80-seed counterfactual guard; v3_20 mean 17.1375 min 14 max 19 to probe mean 17.41 min 15 max 19
样本
seed 0 / 80 局 / 717207 步
分数
15.0 / 17.4125 / 19.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1017.027000True
2119.05262True
3216.05408True
4318.05458True
5418.05454True
6515.07953True
7618.05458True
8715.07284True
9815.07953True
10917.027000True
111019.05261True
121119.05257True
131215.07285True
141319.05544True
151415.07285True
161516.05408True
171619.05261True
181715.07284True
191818.05458True
201919.05256True
212015.07953True
222119.05261True
232219.05258True
242315.07949True
252418.05460True
262517.027000True
272619.05258True
282715.07949True
292819.05263True
302919.05258True
313019.05262True
323119.05544True
333215.07948True
343318.05455True
353419.05263True
363519.05261True
373619.05262True
383715.07284True
393819.05263True
403919.05261True
414018.05459True
424117.027000True
434218.05459True
444317.027000True
454418.05455True
464515.07285True
474619.05256True
484715.07952True
494817.027000True
504918.05458True
515017.027000True
525119.05257True
535219.05258True
545319.05257True
555415.07284True
565519.05260True
575619.05258True
585715.07284True
595815.07953True
605919.05259True
616019.05260True
626115.07953True
636219.05544True
646317.027000True
656417.027000True
666519.05545True
676618.05455True
686715.07953True
696819.05544True
706919.05543True
717019.05545True
727115.07285True
737217.027000True
747319.05543True
757415.07281True
767517.027000True
777619.05263True
787715.07948True
797819.05262True
807915.07948True

Round 41: ram_intercept_v3_20_y86_score60_serve_initial_noop

2026-05-09T18:51:51.602778+00:00
均分 17.1375
改动
Round 45: solidify score 6-0 y86 serve-initial DOWN->NOOP after full 80-seed counterfactual guard; v3_19 mean 16.8625 min 13 max 19 to probe mean 17.14 min 14 max 19
样本
seed 0 / 80 局 / 727217 步
分数
14.0 / 17.1375 / 19.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1017.027000True
2119.05262True
3216.05408True
4318.05458True
5418.05454True
6514.08408True
7618.05458True
8714.07739True
9814.08408True
10917.027000True
111019.05261True
121119.05257True
131214.07740True
141319.05544True
151414.07740True
161516.05408True
171619.05261True
181714.07739True
191818.05458True
201919.05256True
212014.08408True
222119.05261True
232219.05258True
242314.08404True
252418.05460True
262517.027000True
272619.05258True
282714.08404True
292819.05263True
302919.05258True
313019.05262True
323119.05544True
333214.08403True
343318.05455True
353419.05263True
363519.05261True
373619.05262True
383714.07739True
393819.05263True
403919.05261True
414018.05459True
424117.027000True
434218.05459True
444317.027000True
454418.05455True
464514.07740True
474619.05256True
484714.08407True
494817.027000True
504918.05458True
515017.027000True
525119.05257True
535219.05258True
545319.05257True
555414.07739True
565519.05260True
575619.05258True
585714.07739True
595814.08408True
605919.05259True
616019.05260True
626114.08408True
636219.05544True
646317.027000True
656417.027000True
666519.05545True
676618.05455True
686714.08408True
696819.05544True
706919.05543True
717019.05545True
727114.07740True
737217.027000True
747319.05543True
757414.07736True
767517.027000True
777619.05263True
787714.08403True
797819.05262True
807914.08403True

Round 40: ram_intercept_v3_19_video_check_seed1

2026-05-09T17:46:03.441851+00:00
均分 19.0
改动
video check for current retained v3_19 policy on seed 1; avoids known seed 0/9 inactive-tail loop
样本
seed 1 / 1 局 / 5262 步
分数
19.0 / 19.0 / 19.0 低/均/高
建议
分数无波动;每次只改一个启发式,再多 seed 测。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1119.05262True打开rgb episode gif打开obs episode gif

Round 39: ram_intercept_v3_19_video_check

2026-05-09T16:42:02.651397+00:00
均分 17.0
改动
video check for current retained v3_19 policy; one recorded episode for report.html viewing
样本
seed 0 / 1 局 / 27000 步
分数
17.0 / 17.0 / 17.0 低/均/高
建议
分数无波动;每次只改一个启发式,再多 seed 测。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1017.027000True打开2xrgb episode gif打开1x2xobs episode gif

Round 38: ram_intercept_v3_19_y86_after_score22_bounce_up

2026-05-09T15:18:29.657105+00:00
均分 16.8625
改动
Round 39: solidify stateful y86 x166 bounce DOWN->UP only after score 2-2 family marker; v3_18 mean 14.9125 min 13 max 19 to probe mean 16.86 min 13 max 19
样本
seed 0 / 80 局 / 721387 步
分数
13.0 / 16.8625 / 19.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
2026-05-09T15:22:33+00:00Noney86_after_score22_bounce_familyRound 39 diagnostics/counterfactual_y86_bounce_x166_80seed_round39.jsonl showed ungated x166 y86 bounce DOWN->UP has a split effect: 26 score-13 seeds improve to 19 but 22 score-13 seeds collapse to 6. Score-2-2-only in diagnostics/counterfactual_y86_bounce_score22_x166_80seed_round39.jsonl was inert. Excluding score 6-0 in diagnostics/counterfactual_y86_bounce_not_score60_x166_80seed_round39.jsonl still regressed the hidden family at score 8-1 to 7. A stateful gate that first observes the score-2-2 x166 bounce family, then allows later x166 DOWN->UP interventions, improved 26 seeds with no regressions in diagnostics/counterfactual_y86_bounce_after_score22_x166_80seed_round39.jsonl: mean 14.91 to 16.86, min 13, max 19. Formal v3_19 eval scored mean 16.8625 min 13 max 19.The y86 x166 bounce action is causal but two visually similar score chains require opposite treatment. The safe discriminator is not a single score exclusion; it is an episode-level family marker set by seeing the score-2-2 x166 bounce before applying later x166 DOWN->UP corrections.Retained v3_19; trace remaining score-13 seeds and determine whether the floor is now pure high-down/low-up, pure y86 score6-0/8-1, or a mixed residual family.

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1017.027000True
2119.05262True
3216.05408True
4318.05458True
5418.05454True
6513.08143True
7618.05458True
8713.07474True
9813.08143True
10917.027000True
111019.05261True
121119.05257True
131213.07475True
141319.05544True
151413.07475True
161516.05408True
171619.05261True
181713.07474True
191818.05458True
201919.05256True
212013.08143True
222119.05261True
232219.05258True
242313.08139True
252418.05460True
262517.027000True
272619.05258True
282713.08139True
292819.05263True
302919.05258True
313019.05262True
323119.05544True
333213.08138True
343318.05455True
353419.05263True
363519.05261True
373619.05262True
383713.07474True
393819.05263True
403919.05261True
414018.05459True
424117.027000True
434218.05459True
444317.027000True
454418.05455True
464513.07475True
474619.05256True
484713.08142True
494817.027000True
504918.05458True
515017.027000True
525119.05257True
535219.05258True
545319.05257True
555413.07474True
565519.05260True
575619.05258True
585713.07474True
595813.08143True
605919.05259True
616019.05260True
626113.08143True
636219.05544True
646317.027000True
656417.027000True
666519.05545True
676618.05455True
686713.08143True
696819.05544True
706919.05543True
717019.05545True
727113.07475True
737217.027000True
747319.05543True
757413.07471True
767517.027000True
777619.05263True
787713.08138True
797819.05262True
807913.08138True

Round 37: ram_intercept_v3_18_high_down_score40_bounce_noop

2026-05-09T14:20:12.603639+00:00
均分 14.9125
改动
Round 37: solidify score 4-0 high-down bounce x144-148 y50-56 dy9 DOWN->NOOP after history diagnostic; v3_17 mean 14.6375 min 12 max 19 to probe mean 14.91 min 13 max 19
样本
seed 0 / 80 局 / 763117 步
分数
13.0 / 14.9125 / 19.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
2026-05-09T14:20:12+00:00Nonehigh_down_score40_bounce_familyRound 37 diagnostics/high_down_history_x150_v3_17_round37.json showed the high-down floor diverges one frame before x150: losses use DOWN at the dy=9 bounce frame while wins enter x150 with a stable NOOP phase. Ungated DOWN->NOOP at diagnostics/counterfactual_high_down_y165_bounce_x146_80seed_round37.jsonl improved 22 seeds but regressed four score 0-0 hidden seeds to 11; score-4-0 gating in diagnostics/counterfactual_high_down_y165_bounce_score40_80seed_round37.jsonl improved 22 seeds with no regressions, raising mean 14.64 to 14.91 and min 12 to 13. Formal v3_18 eval scored mean 14.9125 min 13 max 19.The high-down y165 floor has a safe causal correction at score 4-0 on the dy=9 bounce frame x=144..148,y=50..56,paddle=143..150,target=154..162: replacing baseline DOWN with NOOP preserves the next-frame paddle phase. The same geometry at score 0-0 is a hidden different family and must remain untouched.Retained v3_18; trace remaining score-13 seeds and re-split the new floor before testing any further high-down or later-y86 correction.

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1017.027000True
2113.06867True
3216.05408True
4318.05458True
5418.05454True
6513.08143True
7618.05458True
8713.07474True
9813.08143True
10917.027000True
111013.06866True
121113.06862True
131213.07475True
141319.05544True
151413.07475True
161516.05408True
171613.06866True
181713.07474True
191818.05458True
201913.06861True
212013.08143True
222113.06866True
232213.06863True
242313.08139True
252418.05460True
262517.027000True
272613.06863True
282713.08139True
292813.06868True
302913.06863True
313013.06867True
323119.05544True
333213.08138True
343318.05455True
353413.06868True
363513.06866True
373613.06867True
383713.07474True
393813.06868True
403913.06866True
414018.05459True
424117.027000True
434218.05459True
444317.027000True
454418.05455True
464513.07475True
474613.06861True
484713.08142True
494817.027000True
504918.05458True
515017.027000True
525113.06862True
535213.06863True
545313.06862True
555413.07474True
565513.06865True
575613.06863True
585713.07474True
595813.08143True
605913.06864True
616013.06865True
626113.08143True
636219.05544True
646317.027000True
656417.027000True
666519.05545True
676618.05455True
686713.08143True
696819.05544True
706919.05543True
717019.05545True
727113.07475True
737217.027000True
747319.05543True
757413.07471True
767517.027000True
777613.06868True
787713.08138True
797813.06867True
807913.08138True

Round 36: ram_intercept_v3_17_y86_score82_early_setup_down

2026-05-09T13:37:32.626697+00:00
均分 14.6375
改动
Round 34: solidify score 8-2 y86 offset8 UP->DOWN correction only during steps 2050..2110 after full guard; v3_16 mean 14.3125 min 12 max 19 to probe mean 14.64 min 12 max 19
样本
seed 0 / 80 局 / 769757 步
分数
12.0 / 14.6375 / 19.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
2026-05-09T13:47:54+00:00Noney86_score82_early_step_familyRound 34 diagnostics/counterfactual_y86_score82_early_step_80seed_round34.jsonl showed score 8-2 y86 offset8 UP->DOWN only during steps 2050..2110 improves 0..79 mean from 14.31 to 14.64 with min 12 max 19 and 26 interventions; formal v3_17 eval scored mean 14.6375 min 12 max 19.The score 8-2 y86 setup correction is safe only in the early episode phase around steps 2050..2110; later score 8-2 triggers around steps 2332..2336 belong to a hidden phase that collapses under the same action replacement.Retained v3_17; trace remaining score-12 seeds and identify whether floor is now high-down y165, later y86, or another family.
2026-05-09T13:55:47+00:00Nonehigh_down_y165_x150_up_to_noop_rejectedRound 36 diagnostics/counterfactual_high_down_y165_x150_80seed_round36.jsonl showed high-down x150/y61 UP->NOOP fired 12 times at score 4-0 around steps 1454..1459 but left every seed unchanged: mean 14.64, min 12, max 19, improved_count 0, worsened_count 0.The x150 win/loss action difference in diagnostics/high_down_outcome_compare_v3_17_round36.json is a correlated phase signature rather than a causal one-frame action fix; replacing UP with NOOP at this anchor is inert for score outcomes.Keep v3_17. Use the high-down outcome comparison to inspect earlier setup/controller history or build a phase-history diagnostic instead of adding another local high-down action replacement.

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1017.027000True
2113.06867True
3216.05408True
4318.05458True
5418.05454True
6512.08718True
7618.05458True
8712.07448True
9812.08718True
10917.027000True
111013.06866True
121113.06862True
131212.07449True
141319.05544True
151412.07449True
161516.05408True
171613.06866True
181712.07448True
191818.05458True
201913.06861True
212012.08718True
222113.06866True
232213.06863True
242312.08714True
252418.05460True
262517.027000True
272613.06863True
282712.08714True
292813.06868True
302913.06863True
313013.06867True
323119.05544True
333212.08713True
343318.05455True
353413.06868True
363513.06866True
373613.06867True
383712.07448True
393813.06868True
403913.06866True
414018.05459True
424117.027000True
434218.05459True
444317.027000True
454418.05455True
464512.07449True
474613.06861True
484712.08717True
494817.027000True
504918.05458True
515017.027000True
525113.06862True
535213.06863True
545313.06862True
555412.07448True
565513.06865True
575613.06863True
585712.07448True
595812.08718True
605913.06864True
616013.06865True
626112.08718True
636219.05544True
646317.027000True
656417.027000True
666519.05545True
676618.05455True
686712.08718True
696819.05544True
706919.05543True
717019.05545True
727112.07449True
737217.027000True
747319.05543True
757412.07445True
767517.027000True
777613.06868True
787712.08713True
797813.06867True
807912.08713True

Round 35: ram_intercept_v3_16_y86_score20_serve_initial_up

2026-05-09T12:05:35.206021+00:00
均分 14.3125
改动
Round 27: solidify score 2-0 y86 serve-initial DOWN->UP correction after full guard; v3_15 mean 14.1625 min 11 max 19 to probe mean 14.31 min 12 max 19
样本
seed 0 / 80 局 / 778545 步
分数
12.0 / 14.3125 / 19.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
2026-05-09T12:09:34.765255+00:00Noney86_score20_serve_initial_familyRound 27 diagnostics/counterfactual_y86_score20_serve_initial_80seed_round27.jsonl showed score 2-0 serve-initial DOWN->UP improves 0..79 mean 14.16 to 14.31 and min 11 to 12; formal v3_16 eval scored mean 14.3125 min 12 max 19.The remaining score-2-0 y86 loop is best disrupted at the point-reset first active frame x=126 y=126: baseline DOWN sets the paddle into a bad loop phase, while UP avoids the later repeated y86 loss. DOWN->NOOP is unsafe.Retained v3_16; trace score-12 seeds and identify whether y86 remains the floor or a new family dominates.
2026-05-09T12:23:53.525867+00:00Nonehigh_down_y165_offset4_rejectedRound 28 diagnostics/counterfactual_high_down_y165_offset4_80seed_round28.jsonl showed high-down y165 offset4 UP->NOOP and UP->DOWN both left 0..79 scores unchanged at mean 14.31 min 12 max 19 despite interventions.The v3_16 high-down y165 floor is not controlled by the local x=190 y=141 paddle=155 UP action; the miss likely depends on earlier setup or a different collision phase.Keep v3_16; continue diagnosing score-12 floor, comparing y86 and high-down families with earlier setup features.
2026-05-09T12:38:12+00:00Nonehigh_down_y165_offset4_h3_rejectedRound 29 diagnostics/counterfactual_high_down_y165_offset4_h3_80seed_round29.jsonl showed high-down y165 offset4 UP->NOOP and UP->DOWN with horizon 3 both left 0..79 scores unchanged at mean 14.31 min 12 max 19 despite 108 interventions each.The high-down y165 floor is not controlled by the local offset4 action duration; extending the same replacement for three frames is still inert, so the relevant control point is earlier or the failure is determined by a different phase feature.Keep v3_16; compare earlier high-down offsets or high-down success/failure setup before adding any policy code.
2026-05-09T12:55:02+00:00Nonehigh_down_y165_x166_rejectedRound 30 diagnostics/counterfactual_high_down_y165_x166_80seed_round30.jsonl showed x166 y93 DOWN->NOOP drops 0..79 mean from 14.31 to 8.90 with min -20, while DOWN->UP drops mean to 4.45 with min -20 over 97 interventions.Although high-down wins often have NOOP at a nearby x166 anchor and losses have DOWN, directly replacing the loss-family DOWN action is unsafe; the observed win/loss difference is a correlated phase signature, not a safe local action edit.Keep v3_16; inspect earlier setup/score phase or use a safer discriminator before any high-down policy correction.
2026-05-09T13:04:18+00:00Nonehigh_down_y165_initial_x166_rejectedRound 31 diagnostics/counterfactual_high_down_y165_initial_x166_80seed_round31.jsonl showed the narrower score 0-0 x166/y97/pad135 DOWN->UP trigger fires 15 times but lowers 0..79 mean from 14.31 to 14.24, with regressions on seeds 33, 44, and 66 and no improvements.The x166 improvement seen inside the broad unsafe probe was not reproducible when isolated; the apparent +1 cases were downstream artifacts of the broader intervention pattern, not a safe one-shot first-point correction.Keep v3_16; avoid x166 high-down action replacements and focus on state/phase diagnostics or the remaining y86 floor family.
2026-05-09T13:25:41+00:00Noney86_final_down_rejectedRound 32 diagnostics/counterfactual_y86_final_down_80seed_round32.jsonl showed y86 final x202/y86 DOWN->NOOP leaves 0..79 scores unchanged at mean 14.31 min 12 max 19 despite 262 interventions, while DOWN->UP drops mean to 5.84 and min to -11 over 132 interventions.The repeated y86 score-12 floor is not caused by the final contact-frame DOWN action; stopping it is inert and reversing it is harmful. The loop must be determined by earlier phase/setup or target model behavior.Keep v3_16; use floor-family summary and y86 tail offsets to inspect earlier phase or state-model explanations instead of final-frame action replacement.
2026-05-09T13:37:16+00:00Noney86_score82_setup_rejectedRound 33 diagnostics/counterfactual_y86_score82_80seed_round33.jsonl showed ungated score 8-2 y86 offset8 UP->DOWN improves 26 early y86 floor seeds from 12 to 13 but collapses 10 later phase seeds to -13, reducing 0..79 mean from 14.31 to 11.51 and min from 12 to -13.The score 8-2 y86 setup correction has the same hidden phase split seen at score 6-2: geometry and score alone are unsafe because later step phases require the baseline action.Keep v3_16 until an early step gate is validated; test score 8-2 only around the safe 2086..2093 intervention band.

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1017.027000True
2112.07205True
3216.05408True
4318.05458True
5418.05454True
6512.08718True
7618.05458True
8712.07448True
9812.08718True
10917.027000True
111012.07204True
121112.07200True
131212.07449True
141319.05544True
151412.07449True
161516.05408True
171612.07204True
181712.07448True
191818.05458True
201912.07199True
212012.08718True
222112.07204True
232212.07201True
242312.08714True
252418.05460True
262517.027000True
272612.07201True
282712.08714True
292812.07206True
302912.07201True
313012.07205True
323119.05544True
333212.08713True
343318.05455True
353412.07206True
363512.07204True
373612.07205True
383712.07448True
393812.07206True
403912.07204True
414018.05459True
424117.027000True
434218.05459True
444317.027000True
454418.05455True
464512.07449True
474612.07199True
484712.08717True
494817.027000True
504918.05458True
515017.027000True
525112.07200True
535212.07201True
545312.07200True
555412.07448True
565512.07203True
575612.07201True
585712.07448True
595812.08718True
605912.07202True
616012.07203True
626112.08718True
636219.05544True
646317.027000True
656417.027000True
666519.05545True
676618.05455True
686712.08718True
696819.05544True
706919.05543True
717019.05545True
727112.07449True
737217.027000True
747319.05543True
757412.07445True
767517.027000True
777612.07206True
787712.08713True
797812.07205True
807912.08713True

Round 34: ram_intercept_v3_15_y86_score62_early_setup_down

2026-05-09T11:19:26.190173+00:00
均分 14.1625
改动
Round 24: solidify step-gated y86 6-2 setup UP->DOWN correction after full guard; v3_14 mean 13.8375 min 11 max 19 to probe mean 14.16 min 11 max 19
样本
seed 0 / 80 局 / 773925 步
分数
11.0 / 14.1625 / 19.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
2026-05-09T11:23:31.228809+00:00Noney86_score62_early_step_familyRound 24 diagnostics/counterfactual_y86_score62_early_step_80seed_round23.jsonl showed step-gated score 6-2 y86 offset8 UP->DOWN improves 0..79 mean 13.84 to 14.16 with min 11 max 19; formal v3_15 eval scored mean 14.1625 min 11 max 19.Score 6-2 y86 setup correction is only safe in the early episode phase around steps 1500..1700; later score 6-2 triggers around steps 2385..2390 belong to hidden families that collapse under the same action replacement.Retained v3_15; trace remaining score-11 seeds and look for floor-improving families rather than extending y86 score gates blindly.
2026-05-09T11:37:47.292206+00:00Noney86_score20_setup_rejectedRound 25 diagnostics/counterfactual_y86_score20_80seed_round25.jsonl showed score 2-0 y86 offset8 UP->NOOP had no score effect and UP->DOWN dropped mean 14.16 to 9.66 with min -19 over seeds 0..79.Remaining score-11 seeds include early score 2-0 y86 losses, but the same offset8 action replacement is either inert or catastrophic; this floor needs a different discriminator or earlier model, not direct UP replacement.Keep v3_15; inspect successful versus catastrophic y86 families for earlier trajectory or paddle phase differences beyond score and step.
2026-05-09T11:52:11.625142+00:00Noney86_score20_approach_x82_rejectedRound 26 diagnostics/counterfactual_y86_score20_approach_x82_80seed_round26.jsonl showed score 2-0 x82 approach UP->NOOP had no effect and UP->DOWN dropped touched seeds 11 to 9, reducing 0..79 mean 14.16 to 13.86 and min 11 to 9.The remaining score-2-0 y86 loop is not fixed by changing the near-target x82 approach action; direct earlier paddle-direction changes still worsen the loop.Keep v3_15; investigate even earlier serve/return setup or non-action model features instead of direct y86 approach replacement.

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1017.027000True
2112.07205True
3216.05408True
4318.05458True
5418.05454True
6511.08333True
7618.05458True
8712.07448True
9811.08333True
10917.027000True
111012.07204True
121112.07200True
131212.07449True
141319.05544True
151412.07449True
161516.05408True
171612.07204True
181712.07448True
191818.05458True
201912.07199True
212011.08333True
222112.07204True
232212.07201True
242311.08329True
252418.05460True
262517.027000True
272612.07201True
282711.08329True
292812.07206True
302912.07201True
313012.07205True
323119.05544True
333211.08328True
343318.05455True
353412.07206True
363512.07204True
373612.07205True
383712.07448True
393812.07206True
403912.07204True
414018.05459True
424117.027000True
434218.05459True
444317.027000True
454418.05455True
464512.07449True
474612.07199True
484711.08332True
494817.027000True
504918.05458True
515017.027000True
525112.07200True
535212.07201True
545312.07200True
555412.07448True
565512.07203True
575612.07201True
585712.07448True
595811.08333True
605912.07202True
616012.07203True
626111.08333True
636219.05544True
646317.027000True
656417.027000True
666519.05545True
676618.05455True
686711.08333True
696819.05544True
706919.05543True
717019.05545True
727112.07449True
737217.027000True
747319.05543True
757412.07445True
767517.027000True
777612.07206True
787711.08328True
797812.07205True
807911.08328True

Round 33: ram_intercept_v3_14_y86_score42_setup_down

2026-05-09T10:51:55.791188+00:00
均分 13.8375
改动
Round 22: solidify one-shot y86 4-2 setup UP->DOWN correction after 80-seed counterfactual; v3_13 mean 13.5125 min 10 max 19 to probe mean 13.84 min 11 max 19
样本
seed 0 / 80 局 / 782713 步
分数
11.0 / 13.8375 / 19.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
2026-05-09T10:56:35.664758+00:00Noney86_score42_setup_familyRound 22 diagnostics/counterfactual_y86_score42_80seed_round22.jsonl showed once_y86_setup_offset8_score42_up_to_down_h1 improves 0..79 mean 13.51 to 13.84 and min 10 to 11 with 26 interventions; formal v3_14 eval scored mean 13.8375 min 11 max 19.After the v3_13 2-2 fix, the same y86 offset8 setup failure recurs at score 4-2; one-shot UP->DOWN at x=172..176 y=56..60 dy=4 paddle=84..90 target=74..78 breaks the next repeated loss without regressions.Retained v3_14; trace score-11 seeds to see whether the y86 score-chain continues or another family becomes the floor.
2026-05-09T11:09:09.985099+00:00Noney86_score62_setup_rejectedRound 23 diagnostics/counterfactual_y86_score62_80seed_round23.jsonl showed once_y86_setup_offset8_score62_up_to_down_h1 improved some y86 floor seeds 11 to 12 but regressed many hidden seeds to -15; full 0..79 mean dropped 13.84 to 10.26 and min 11 to -15.The y86 score-chain cannot be safely extended by simply adding score 6-2; this gate overlaps hidden seed families where the same geometry is harmful.Keep v3_14; do not add score 6-2 policy correction. Trace score-11 floor and look for safer discriminators or a different floor family.

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1017.027000True
2111.07543True
3216.05408True
4318.05458True
5418.05454True
6511.08333True
7618.05458True
8712.07448True
9811.08333True
10917.027000True
111011.07542True
121111.07538True
131212.07449True
141319.05544True
151412.07449True
161516.05408True
171611.07542True
181712.07448True
191818.05458True
201911.07537True
212011.08333True
222111.07542True
232211.07539True
242311.08329True
252418.05460True
262517.027000True
272611.07539True
282711.08329True
292811.07544True
302911.07539True
313011.07543True
323119.05544True
333211.08328True
343318.05455True
353411.07544True
363511.07542True
373611.07543True
383712.07448True
393811.07544True
403911.07542True
414018.05459True
424117.027000True
434218.05459True
444317.027000True
454418.05455True
464512.07449True
474611.07537True
484711.08332True
494817.027000True
504918.05458True
515017.027000True
525111.07538True
535211.07539True
545311.07538True
555412.07448True
565511.07541True
575611.07539True
585712.07448True
595811.08333True
605911.07540True
616011.07541True
626111.08333True
636219.05544True
646317.027000True
656417.027000True
666519.05545True
676618.05455True
686711.08333True
696819.05544True
706919.05543True
717019.05545True
727112.07449True
737217.027000True
747319.05543True
757412.07445True
767517.027000True
777611.07544True
787711.08328True
797811.07543True
807911.08328True

Round 32: ram_intercept_v3_13_y86_score22_setup_down

2026-05-09T10:35:06.749627+00:00
均分 13.5125
改动
Round 21: solidify one-shot y86 2-2 setup UP->DOWN correction after 80-seed score-gated counterfactual; v3_12 mean 13.1875 min 9 max 19 to probe mean 13.51 min 10 max 19
样本
seed 0 / 80 局 / 791501 步
分数
10.0 / 13.5125 / 19.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
2026-05-09T10:39:17.244369+00:00Noney86_score22_setup_familyRound 21 diagnostics/counterfactual_y86_offset8_scoregates_80seed_round21.jsonl showed once_y86_setup_offset8_tied22_up_to_down_h1 improves 0..79 mean 13.19 to 13.51 and min 9 to 10 with 26 interventions; formal v3_13 eval scored mean 13.5125 min 10 max 19.The repeated y86 floor family has a safe one-shot phase correction only at score 2-2: x=172..176 y=56..60 dy=4 paddle=84..90 target=74..78 baseline UP overcorrects; DOWN breaks the repeated loss cycle. Broader offset8 gates are unsafe.Retained v3_13; next trace remaining score-10 seeds and continue floor-focused diagnostics.

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1017.027000True
2110.07881True
3216.05408True
4318.05458True
5418.05454True
6511.08333True
7618.05458True
8712.07448True
9811.08333True
10917.027000True
111010.07880True
121110.07876True
131212.07449True
141319.05544True
151412.07449True
161516.05408True
171610.07880True
181712.07448True
191818.05458True
201910.07875True
212011.08333True
222110.07880True
232210.07877True
242311.08329True
252418.05460True
262517.027000True
272610.07877True
282711.08329True
292810.07882True
302910.07877True
313010.07881True
323119.05544True
333211.08328True
343318.05455True
353410.07882True
363510.07880True
373610.07881True
383712.07448True
393810.07882True
403910.07880True
414018.05459True
424117.027000True
434218.05459True
444317.027000True
454418.05455True
464512.07449True
474610.07875True
484711.08332True
494817.027000True
504918.05458True
515017.027000True
525110.07876True
535210.07877True
545310.07876True
555412.07448True
565510.07879True
575610.07877True
585712.07448True
595811.08333True
605910.07878True
616010.07879True
626111.08333True
636219.05544True
646317.027000True
656417.027000True
666519.05545True
676618.05455True
686711.08333True
696819.05544True
706919.05543True
717019.05545True
727112.07449True
737217.027000True
747319.05543True
757412.07445True
767517.027000True
777610.07882True
787711.08328True
797810.07881True
807911.08328True

Round 31: ram_intercept_v3_12_low_up_y111_down

2026-05-09T09:48:19.656719+00:00
均分 13.1875
改动
Round 19: solidify low-up y111 precontact UP->DOWN correction after 80-seed counterfactual guard; v3_11 mean 12.75 min 9 max 19 to probe mean 13.19 min 9 max 19
样本
seed 0 / 80 局 / 800289 步
分数
9.0 / 13.1875 / 19.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
2026-05-09T09:52:48.464977+00:00Nonescore9_low_up_y111_familyRound 19 trace diagnostics/point_traces_score9_subset_round19.jsonl found repeated low_up/paddle_above_target losses for seeds 2 and 15 at x=203 y=111 dy=-8 target=111 paddle=101; counterfactual diagnostics/counterfactual_low_up_y111_80seed_round19.jsonl improved seeds 2,15,33,44,66 with no 0..79 regressions; formal v3_12 scored mean 13.1875 min 9 max 19.The x=187..195 y=127..143 dy=-8 precontact UP action overcorrects upward before the y111 low-up return; replacing it with DOWN preserves paddle phase.Retained v3_12; next investigate remaining score-9 y86 near-target family and avoid the rejected late-up replacement that collapsed to -17.

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1017.027000True
219.08219True
3216.05408True
4318.05458True
5418.05454True
6511.08333True
7618.05458True
8712.07448True
9811.08333True
10917.027000True
11109.08218True
12119.08214True
131212.07449True
141319.05544True
151412.07449True
161516.05408True
17169.08218True
181712.07448True
191818.05458True
20199.08213True
212011.08333True
22219.08218True
23229.08215True
242311.08329True
252418.05460True
262517.027000True
27269.08215True
282711.08329True
29289.08220True
30299.08215True
31309.08219True
323119.05544True
333211.08328True
343318.05455True
35349.08220True
36359.08218True
37369.08219True
383712.07448True
39389.08220True
40399.08218True
414018.05459True
424117.027000True
434218.05459True
444317.027000True
454418.05455True
464512.07449True
47469.08213True
484711.08332True
494817.027000True
504918.05458True
515017.027000True
52519.08214True
53529.08215True
54539.08214True
555412.07448True
56559.08217True
57569.08215True
585712.07448True
595811.08333True
60599.08216True
61609.08217True
626111.08333True
636219.05544True
646317.027000True
656417.027000True
666519.05545True
676618.05455True
686711.08333True
696819.05544True
706919.05543True
717019.05545True
727112.07449True
737217.027000True
747319.05543True
757412.07445True
767517.027000True
77769.08220True
787711.08328True
79789.08219True
807911.08328True

Round 30: ram_intercept_v3_11_low_mid_score00_noop

2026-05-09T09:24:48.512264+00:00
均分 12.75
改动
Round 18: solidify seed44/66 low-mid score00 DOWN->NOOP correction after 80-seed counterfactual guard
样本
seed 0 / 80 局 / 792374 步
分数
9.0 / 12.75 / 19.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
2026-05-09T09:29:32.301536+00:00Nonelow_mid_score00_shared_weak_familyRound 18 counterfactual diagnostics/counterfactual_seed44_first_fix_80seed_round18.jsonl changed only seeds 33,44,66 from -1 to 11; formal v3_11 80-seed eval scored mean 12.75 min 9 max 19At score 0-0, dy=4 x=177..181 y=53..57 paddle=45..55 target=65..70, baseline DOWN keeps the paddle too low for a shared seed33/44/66 first-loss family; NOOP preserves phase.Retained v3_11 correction; next trace remaining score-9 seeds to look for a new repeated min-limiting family.

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1017.027000True
219.08219True
329.03825True
4318.05458True
5418.05454True
6511.08333True
7618.05458True
8712.07448True
9811.08333True
10917.027000True
11109.08218True
12119.08214True
131212.07449True
141319.05544True
151412.07449True
16159.03825True
17169.08218True
181712.07448True
191818.05458True
20199.08213True
212011.08333True
22219.08218True
23229.08215True
242311.08329True
252418.05460True
262517.027000True
27269.08215True
282711.08329True
29289.08220True
30299.08215True
31309.08219True
323119.05544True
333211.08328True
343311.03872True
35349.08220True
36359.08218True
37369.08219True
383712.07448True
39389.08220True
40399.08218True
414018.05459True
424117.027000True
434218.05459True
444317.027000True
454411.03872True
464512.07449True
47469.08213True
484711.08332True
494817.027000True
504918.05458True
515017.027000True
52519.08214True
53529.08215True
54539.08214True
555412.07448True
56559.08217True
57569.08215True
585712.07448True
595811.08333True
60599.08216True
61609.08217True
626111.08333True
636219.05544True
646317.027000True
656417.027000True
666519.05545True
676611.03872True
686711.08333True
696819.05544True
706919.05543True
717019.05545True
727112.07449True
737217.027000True
747319.05543True
757412.07445True
767517.027000True
77769.08220True
787711.08328True
79789.08219True
807911.08328True

Round 29: ram_intercept_v3_10_low_up_score01_noop_seed60_79

2026-05-09T09:02:56.491867+00:00
均分 13.8
改动
expanded validation for retained v3_10 policy over seeds 60..79 after observed seeds 0..59 mean about 11.8 min -1 max 19; look for new collapse before further tuning
样本
seed 60 / 20 局 / 241136 步
分数
-1.0 / 13.8 / 19.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1609.08217True
26111.08333True
36219.05544True
46317.027000True
56417.027000True
66519.05545True
766-1.027000True
86711.08333True
96819.05544True
106919.05543True
117019.05545True
127112.07449True
137217.027000True
147319.05543True
157412.07445True
167517.027000True
17769.08220True
187711.08328True
19789.08219True
207911.08328True

Round 28: ram_intercept_v3_10_low_up_score01_noop_seed40_59

2026-05-09T09:00:07.834203+00:00
均分 12.1
改动
expanded validation for retained v3_10 policy over seeds 40..59 after 0..39 mean 11.65 min -1 max 19; look for new collapse before further tuning
样本
seed 40 / 20 局 / 247890 步
分数
-1.0 / 12.1 / 18.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
14018.05459True
24117.027000True
34218.05459True
44317.027000True
544-1.027000True
64512.07449True
7469.08213True
84711.08332True
94817.027000True
104918.05458True
115017.027000True
12519.08214True
13529.08215True
14539.08214True
155412.07448True
16559.08217True
17569.08215True
185712.07448True
195811.08333True
20599.08216True

Round 27: ram_intercept_v3_10_low_up_score01_noop

2026-05-09T08:56:35.179654+00:00
均分 11.65
改动
expanded-set low-up score01 correction: keep v3_9 and add one-shot 0-1 score dy=-8 x=176..178 y=172..176 paddle=132..138 target=135..150 baseline DOWN to NOOP; 40-seed counterfactual improved min -21 to -1 with only seed33 changed
样本
seed 0 / 40 局 / 372732 步
分数
-1.0 / 11.65 / 19.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1017.027000True
219.08219True
329.03825True
4318.05458True
5418.05454True
6511.08333True
7618.05458True
8712.07448True
9811.08333True
10917.027000True
11109.08218True
12119.08214True
131212.07449True
141319.05544True
151412.07449True
16159.03825True
17169.08218True
181712.07448True
191818.05458True
20199.08213True
212011.08333True
22219.08218True
23229.08215True
242311.08329True
252418.05460True
262517.027000True
27269.08215True
282711.08329True
29289.08220True
30299.08215True
31309.08219True
323119.05544True
333211.08328True
3433-1.027000True
35349.08220True
36359.08218True
37369.08219True
383712.07448True
39389.08220True
40399.08218True

Round 26: ram_intercept_v3_9_late_low_loop_noop_limited_seed20_39

2026-05-09T08:44:11.896065+00:00
均分 9.4
改动
expanded validation for retained v3_9 policy over seeds 20..39; check for overfit or new collapse after 0..19 mean 12.9 min 9 max 19
样本
seed 20 / 20 局 / 171431 步
分数
-21.0 / 9.4 / 19.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
12011.08333True
2219.08218True
3229.08215True
42311.08329True
52418.05460True
62517.027000True
7269.08215True
82711.08329True
9289.08220True
10299.08215True
11309.08219True
123119.05544True
133211.08328True
1433-21.02263True
15349.08220True
16359.08218True
17369.08219True
183712.07448True
19389.08220True
20399.08218True

Round 25: ram_intercept_v3_9_late_low_loop_noop_limited

2026-05-09T08:42:11.067290+00:00
均分 12.9
改动
late low-loop correction capped to match validated probe: keep v3_8 and apply dy=3..4 x=169..177 y=49..57 paddle=80..105 target=60..75 UP-to-NOOP at most 64 times per episode; revalidate after uncapped formal run underperformed probe
样本
seed 0 / 20 局 / 176564 步
分数
9.0 / 12.9 / 19.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1017.027000True
219.08219True
329.03825True
4318.05458True
5418.05454True
6511.08333True
7618.05458True
8712.07448True
9811.08333True
10917.027000True
11109.08218True
12119.08214True
131212.07449True
141319.05544True
151412.07449True
16159.03825True
17169.08218True
181712.07448True
191818.05458True
20199.08213True

Round 24: ram_intercept_v3_9_late_low_loop_noop

2026-05-09T08:40:31.349485+00:00
均分 11.2
改动
late low-loop correction: keep v3_8 and change dy=3..4 x=169..177 y=49..57 paddle=80..105 target=60..75 baseline UP to NOOP; counterfactual 20-seed probe improved mean 9.4 to 12.9 and min -1 to 9, with max 20 to 19
样本
seed 0 / 20 局 / 176564 步
分数
0.0 / 11.2 / 19.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
100.027000True
219.08219True
329.03825True
4318.05458True
5418.05454True
6511.08333True
7618.05458True
8712.07448True
9811.08333True
1090.027000True
11109.08218True
12119.08214True
131212.07449True
141319.05544True
151412.07449True
16159.03825True
17169.08218True
181712.07448True
191818.05458True
20199.08213True

Round 23: ram_intercept_v3_8_high_down_setup_noop

2026-05-09T08:32:14.021723+00:00
均分 9.4
改动
high-down setup correction: keep v3_7 and change dy=8 x=121..153 y=103..167 paddle=155..195 target=165..185 baseline NOOP to DOWN; counterfactual 20-seed probe improved mean 4.6 to 9.4 with min -1 unchanged
样本
seed 0 / 20 局 / 219266 步
分数
-1.0 / 9.4 / 20.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1020.05267True
219.08219True
329.03825True
43-1.027000True
54-1.027000True
6511.08333True
76-1.027000True
8712.07448True
9811.08333True
10920.05263True
11109.08218True
12119.08214True
131212.07449True
141319.05544True
151412.07449True
16159.03825True
17169.08218True
181712.07448True
1918-1.027000True
20199.08213True

Round 22: ram_intercept_v3_7_mid_up_precontact_noop

2026-05-09T08:21:07.033577+00:00
均分 4.6
改动
mid-up y164 precontact phase correction: keep v3_6 and change only dy=-8 x=187..191 y=188..196 paddle=190..205 baseline UP to NOOP; counterfactual 20-seed probe improved mean 1.5 to 4.6 with min -1 unchanged
样本
seed 0 / 20 局 / 202825 步
分数
-1.0 / 4.6 / 20.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
1020.05267True
21-1.06770True
329.03825True
43-1.027000True
54-1.027000True
652.06523True
76-1.027000True
875.06054True
982.06523True
10920.05263True
1110-1.06769True
1211-1.06765True
13125.06055True
141319.05544True
15145.06055True
16159.03825True
1716-1.06769True
18175.06054True
1918-1.027000True
2019-1.06764True

Round 21: ram_intercept_v3_6_mid_down_precontact_noop

2026-05-09T08:08:29.390957+00:00
均分 1.5
改动
mid-down precontact phase correction: keep v3_5 and change only dy=4 x=186..194 y=68..82 paddle=65..90 baseline UP to NOOP; counterfactual 20-seed probe improved mean 1.2 to 1.5 with min -1 unchanged
样本
seed 0 / 20 局 / 206374 步
分数
-1.0 / 1.5 / 9.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
10-1.06502True
21-1.06770True
329.03825True
43-1.027000True
54-1.027000True
652.06523True
76-1.027000True
875.06054True
982.06523True
109-1.06498True
1110-1.06769True
1211-1.06765True
13125.06055True
1413-1.06623True
15145.06055True
16159.03825True
1716-1.06769True
18175.06054True
1918-1.027000True
2019-1.06764True

Round 20: ram_intercept_v3_5_low_serve_late_noop

2026-05-09T07:55:51.258328+00:00
均分 1.2
改动
low-serve late noop correction: keep v3_4 opening low-dy8 fix and change only late low-serve target=50 noop at x=189..201 y=55..67 pad=35..65 to UP; counterfactual 20-seed probe improved mean 0.2 to 1.2 and min -21 to -1
样本
seed 0 / 20 局 / 235425 步
分数
-1.0 / 1.2 / 9.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
10-1.06502True
21-1.07319True
329.03825True
43-1.027000True
54-1.027000True
651.011558True
76-1.027000True
874.010113True
981.011558True
109-1.06498True
1110-1.07318True
1211-1.07314True
13124.010114True
1413-1.06623True
15144.010114True
16159.03825True
1716-1.07318True
18174.010113True
1918-1.027000True
2019-1.07313True

Round 19: ram_intercept_v3_4_opening_low_dy8

2026-05-09T07:41:30.825519+00:00
均分 0.9
改动
opening low dy=8 one-step correction: at 0-0 score, x=184..188 y=130..145 with paddle recovering upward, execute UP instead of baseline DOWN once; counterfactual probe improved 10-seed mean 0.5 to 0.9
样本
seed 0 / 10 局 / 138373 步
分数
-1.0 / 0.9 / 9.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
10-1.06502True
21-1.07319True
329.03825True
43-1.027000True
54-1.027000True
651.011558True
76-1.027000True
874.010113True
981.011558True
109-1.06498True

Round 18: ram_intercept_v3_4_late_down_keep_down_probe

2026-05-09T06:46:18.399391+00:00
均分 0.5
改动
temporary probe: if late downward ball x=188..204 y=70..140 dy>0 and previous action was down, keep down instead of noop/up; validate action-rhythm hypothesis against v3_3 baseline mean 0.5 min -1 max 9
样本
seed 0 / 10 局 / 137933 步
分数
-1.0 / 0.5 / 9.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
2026-05-09T06:47:21.736760+00:00Nonelate_down_action_rhythm_rules_no_gainRound 2 action probe over seeds 0-9: low_down_force_down scored mean 0.4 min -2 max 9, improving seed7 4->5 but hurting seed5/8 -1->-2; late_down_no_up scored mean -7.4 with seed0/5/8/9 collapses; late_down_keep_down hit 8 times but scored exactly baseline. Formal temporary keep-down policy scored mean 0.5 min -1 max 9, identical to v3_3.Simple late downward action overrides do not solve the residual misses; the deterministic losses likely depend on earlier rally setup or paddle phase over a longer horizon than the last forced action.Compare winning and losing point trajectories by serve/return family and first-contact setup, not only final 12-frame action rhythm.

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
10-1.06502True
21-1.07319True
329.03825True
43-1.027000True
54-1.027000True
65-1.011338True
76-1.027000True
874.010113True
98-1.011338True
109-1.06498True

Round 17: ram_intercept_v3_4_late_low_down_y86_probe

2026-05-09T06:40:34.163055+00:00
均分 0.5
改动
temporary probe: for repeated x~202 y~86 dy=4 losses, add +8 target bias only at x=196..204 y=78..96 dy=3..5; validate against v3_3 baseline mean 0.5 min -1 max 9
样本
seed 0 / 10 局 / 138299 步
分数
-1.0 / 0.5 / 9.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
2026-05-09T06:41:32.361820+00:00Nonelate_low_down_target_bias_no_effectRound probe over seeds 0-9: broad low_mid_down_plus8 improved seed5/8 from -1 to 1 but hurt seed7 from 4 to -2, mean 0.3 vs baseline 0.5. Narrow y86 +8/+6 hit 76 frames but scored exactly baseline. Formal temporary policy v3_4_late_low_down_y86_probe also scored mean 0.5 min -1 max 9, identical to v3_3.The repeated x~202 y~86 dy=4 losses are not fixed by a late target-y bias; paddle/action phase or collision timing is likely the limiting factor.Inspect action rhythm and mode/target jumps in the last 30-80 frames, especially noop/up/down phase before right-edge contact, before trying another policy rule.

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
10-1.06502True
21-1.07319True
329.03825True
43-1.027000True
54-1.027000True
65-1.011521True
76-1.027000True
874.010113True
98-1.011521True
109-1.06498True

Round 16: ram_intercept_v3_3_score_state_regression

2026-05-09T06:21:02.806717+00:00
均分 0.5
改动
state extraction update only: PongState now exposes ram[14] self_score and ram[13] opponent_score discovered from reward-event probes; verify no policy behavior change over seeds 0-9
样本
seed 0 / 10 局 / 138299 步
分数
-1.0 / 0.5 / 9.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
10-1.06502True
21-1.07319True
329.03825True
43-1.027000True
54-1.027000True
65-1.011521True
76-1.027000True
874.010113True
98-1.011521True
109-1.06498True

Round 15: ram_intercept_v3_4_late_lock_5seed

2026-05-09T06:19:02.799542+00:00
均分 -5.8
改动
hypothesis test: lock the last predicted intercept after ball passes SELF_PADDLE_X_RAM instead of chasing current y at the right edge; quick seeds 0-4 no-video max_steps=30000
样本
seed 0 / 5 局 / 70301 步
分数
-21.0 / -5.8 / -1.0 低/均/高
建议
先用 RGB/OBS 标注失败模式,再改 heuristic。

失败记录

时间失败模式证据假设下一步视频
2026-05-09T06:19:35.564470+00:00Nonelate_intercept_lock_regressionram_intercept_v3_4_late_lock_5seed scored [-3,-21,-3,-1,-1], mean -5.8, regressing from v3_3 seeds 0-4 mean +1.4 and 10-seed mean +0.5; seed1 collapsed to -21.Freezing the intercept after x>192 prevents some late chasing but breaks trajectories where the ball still needs the paddle to follow the post-bounce/current y near the right edge.Revert late-lock rule; inspect narrower action-level fixes or tooling before changing policy again.

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
10-3.05884True
21-21.04408True
32-3.06009True
43-1.027000True
54-1.027000True

Round 14: ram_intercept_v3_3_repro_baseline

2026-05-09T06:16:49.146780+00:00
均分 0.5
改动
repro baseline before next change: current ram_intercept_v3_3 over seeds 0-9, no-video max_steps=30000
样本
seed 0 / 10 局 / 138299 步
分数
-1.0 / 0.5 / 9.0 低/均/高
建议
先用 RGB/OBS 标注失败模式,再改 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
10-1.06502True
21-1.07319True
329.03825True
43-1.027000True
54-1.027000True
65-1.011521True
76-1.027000True
874.010113True
98-1.011521True
109-1.06498True

Round 13: ram_intercept_v3_3_10seed_30k_check

2026-05-09T05:57:01.222578+00:00
均分 0.5
改动
no-video long robustness check for ram_intercept_v3_3 over seeds 0-9 with max_steps=30000; distinguishes true near-tie games from 10000-step truncation
样本
seed 0 / 10 局 / 138299 步
分数
-1.0 / 0.5 / 9.0 低/均/高
建议
先用 RGB/OBS 标注失败模式,再改 heuristic。

失败记录

时间失败模式证据假设下一步视频
2026-05-09T06:01:44.246986+00:00Noneresidual_high_intercept_missesv3_3 long 10-seed check mean +0.5 min -1 max 9; trace seed0 residual losses repeat high/intercept misses around ball_y 164..173 and target 173/193. A fresh target-to-paddle offset sweep over seeds 0-9 at max_steps=30000 showed offset 0 remains best; negative/positive offsets regressed badly.remaining errors are phase-specific intercept misses, not a global visual/RAM coordinate offsetif improving beyond usable baseline, add narrow pattern rules for the two residual high-intercept trajectories and validate on 10+ seeds

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
10-1.06502True
21-1.07319True
329.03825True
43-1.027000True
54-1.027000True
65-1.011521True
76-1.027000True
874.010113True
98-1.011521True
109-1.06498True

Round 12: ram_intercept_v3_3_10seed_check

2026-05-09T05:55:36.672979+00:00
均分 0.5
改动
no-video robustness check for ram_intercept_v3_3 over seeds 0-9; scores expected around mean +0.5 with no -20 collapse
样本
seed 0 / 10 局 / 84144 步
分数
-2.0 / 0.5 / 9.0 低/均/高
建议
先用 RGB/OBS 标注失败模式,再改 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
10-1.06502True
21-1.07319True
329.03825True
430.010000False
540.010000False
65-2.010000False
760.010000False
873.010000False
98-2.010000False
109-1.06498True

Round 11: ram_intercept_v3_3

2026-05-09T05:51:13.476925+00:00
均分 1.4
改动
low-serve pattern rule: after detecting reset serve near x=126..138 y=116..128 moving up-right, aim at y=50 for 24 policy steps; no-video sweep improved seeds 0-4 mean from -1.4 to +1.4
样本
seed 0 / 5 局 / 37646 步
分数
-1.0 / 1.4 / 9.0 低/均/高
建议
先用 RGB/OBS 标注失败模式,再改 heuristic。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
10-1.06502True打开1x2xrgb episode gif打开2x3xobs episode gif
21-1.07319True打开rgb episode gif打开obs episode gif
329.03825True打开rgb episode gif打开obs episode gif
430.010000False打开rgb episode gif打开obs episode gif
540.010000False打开rgb episode gif打开obs episode gif

Round 10: ram_intercept_v3_2

2026-05-09T05:39:55.253489+00:00
均分 -1.4
改动
inactive serve-ready center changed from 121 to 100; no-video sweep over seeds 0-4 improved mean from -1.8 to -1.4 and kept target/intercept constants from v3_1
样本
seed 0 / 5 局 / 34843 步
分数
-20.0 / -1.4 / 9.0 低/均/高
建议
先用 RGB/OBS 标注失败模式,再改 heuristic。

失败记录

时间失败模式证据假设下一步视频
2026-05-09T05:43:28.523217+00:00Noneseed0_unfixed_low_serveram_intercept_v3_2 scored [-20, 9, 4, 0, 0]; inactive center improved seeds 3/4 to max_steps but seed0 still collapses at -20remaining weakness is a specific low serve pattern; global center/intercept constants are no longer the best leveradd a targeted serve-pattern detector or policy memory for repeated point-reset low serves, validated against seed0 without regressing seeds 1-4

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
10-20.0856True打开rgb episode gif打开obs episode gif
219.08782True打开rgb episode gif打开obs episode gif
324.05205True打开rgb episode gif打开obs episode gif
430.010000False打开rgb episode gif打开obs episode gif
540.010000False打开rgb episode gif打开obs episode gif

Round 9: ram_intercept_v3_1

2026-05-09T05:26:56.672212+00:00
均分 -1.8
改动
target clamp 46..196 added on top of ram_intercept_v3; no-video sweep over seeds 0-4 improved mean from -2.0 to -1.8
样本
seed 0 / 5 局 / 23217 步
分数
-20.0 / -1.8 / 9.0 低/均/高
建议
先用 RGB/OBS 标注失败模式,再改 heuristic。

失败记录

时间失败模式证据假设下一步视频
2026-05-09T05:29:26.601423+00:00Noneseed0_repeated_low_serve_failureram_intercept_v3_1 scored [-20, 9, 4, -1, -1]; seed0 repeatedly loses the same low serve pattern around ball_y 52 while other seeds are near break-even or positivepolicy needs serve-pattern handling or paddle phase control after point reset; target clamp and reflection tuning are not enoughcompare seed0 low-serve trace with seed1 successful serve trace and add reset/serve-specific positioning if evidence supports it

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
10-20.0856True打开rgb episode gif打开obs episode gif
219.03872True打开rgb episode gif打开obs episode gif
324.05265True打开rgb episode gif打开obs episode gif
43-1.06614True打开rgb episode gif打开obs episode gif
54-1.06610True打开rgb episode gif打开obs episode gif

Round 8: ram_intercept_v3

2026-05-09T05:22:05.593598+00:00
均分 -2.0
改动
tuned intercept policy: runtime reflection bounds fixed; MAX_PLAY_Y_RAM=210 and PADDLE_DEADBAND=3 selected from no-video sweeps over seeds 0-4
样本
seed 0 / 5 局 / 23252 步
分数
-20.0 / -2.0 / 8.0 低/均/高
建议
先用 RGB/OBS 标注失败模式,再改 heuristic。

失败记录

时间失败模式证据假设下一步视频
2026-05-09T05:24:35.196652+00:00Noneseed0_fast_collapseram_intercept_v3 over seeds 0-4 scored [-20, 8, 4, -1, -1]; mean improved to -2.0 but seed0 ended at -20 in 856 stepscurrent intercept constants work for common rallies but fail on one early serve pattern, likely target reflection or paddle-position phase during the first returntrace seed0 first loss sequence and compare with successful seed1 before changing another policy rule

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
10-20.0856True打开rgb episode gif打开obs episode gif
218.03907True打开rgb episode gif打开obs episode gif
324.05265True打开rgb episode gif打开obs episode gif
43-1.06614True打开rgb episode gif打开obs episode gif
54-1.06610True打开rgb episode gif打开obs episode gif

Round 7: ram_intercept_v2

2026-05-09T05:12:41.196821+00:00
均分 -14.4
改动
stateful velocity policy: estimate dx/dy from consecutive RAM states, predict right-side intercept with wall reflection; RGB overlay calibrated from visual probe
样本
seed 0 / 5 局 / 20904 步
分数
-19.0 / -14.4 / 0.0 低/均/高
建议
和上一轮对比,每次只改一个 heuristic。

失败记录

时间失败模式证据假设下一步视频
2026-05-09T05:15:00.338451+00:00Nonelate_or_overpredicted_interceptram_intercept_v2 improved mean score from -20.4 to -14.4 over 5 seeds, but seeds 0-3 still ended at -17 to -19; seed 4 reached max_steps with score 0velocity prediction helps but needs better serve/return-state handling and maybe paddle target offset/speed limitsinspect calibrated RGB videos for the first missed return, then add one targeted adjustment or trace logging before changing more policy constants

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
10-17.02418True打开rgb episode gif打开obs episode gif
21-19.03715True打开rgb episode gif打开obs episode gif
32-17.02419True打开rgb episode gif打开obs episode gif
43-19.02352True打开rgb episode gif打开obs episode gif
540.010000False打开rgb episode gif打开obs episode gif

Round 6: minimal_ram_xgate_v1_calibrated_overlay_diag

2026-05-09T05:11:03.604459+00:00
均分 -21.0
改动
诊断样本:RGB 叠加层已按 visual_x=ram[49]-48.5、visual_y=ram[54]-12.5 校准,用于分析下一轮失败模式
样本
seed 0 / 1 局 / 2145 步
分数
-21.0 / -21.0 / -21.0 低/均/高
建议
仍全输;看 RGB 失败片段,并复核 ball_y / paddle_y 映射。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
10-21.02145True打开rgb episode gif打开obs episode gif

Round 5: minimal_ram_xgate_v1_overlay_diag

2026-05-09T05:08:25.248804+00:00
均分 -21.0
改动
诊断样本:RGB 视频叠加 RAM 推断的球、挡板、动作、reward、score,用于检查状态检测和策略失败模式
样本
seed 0 / 1 局 / 2145 步
分数
-21.0 / -21.0 / -21.0 低/均/高
建议
仍全输;看 RGB 失败片段,并复核 ball_y / paddle_y 映射。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
10-21.02145True打开rgb episode gif打开obs episode gif

Round 4: minimal_ram_xgate_v1

2026-05-09T03:00:51.076953+00:00
均分 -20.4
改动
x-gated target: center paddle unless ball is active and in right half
样本
seed 0 / 5 局 / 7486 步
分数
-21.0 / -20.4 / -20.0 低/均/高
建议
仍全输;看 RGB 失败片段,并复核 ball_y / paddle_y 映射。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
10-21.02185True打开0.25x0.5x1x2x3xrgb episode gif打开0.25x0.5x1x2x3xobs episode gif
21-20.01343True打开0.25x0.5x1xrgb episode gif打开0.25x0.5x1xobs episode gif
32-21.02186True打开0.25x0.5x1xrgb episode gif打开0.25x0.5x1xobs episode gif
43-20.0858True打开0.25x0.5x1xrgb episode gif打开0.25x0.5x1xobs episode gif
54-20.0914True打开0.25x0.5x1xrgb episode gif打开0.25x0.5x1xobs episode gif

Round 3: minimal_ram_v0

2026-05-09T02:46:35.736446+00:00
均分 -20.0
改动
dual rgb and obs video recording
样本
seed 0 / 1 局 / 1343 步
分数
-20.0 / -20.0 / -20.0 低/均/高
建议
仍全输;看 RGB 失败片段,并复核 ball_y / paddle_y 映射。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
10-20.01343True打开0.25x0.5x1xrgb episode gif打开0.25x0.5x1xobs episode gif

Round 2: minimal_ram_v0

2026-05-09T02:38:15.472083+00:00
均分 -20.0
改动
minimal ram policy with per-episode gif recording
样本
seed 0 / 3 局 / 3606 步
分数
-20.0 / -20.0 / -20.0 低/均/高
建议
仍全输;看 RGB 失败片段,并复核 ball_y / paddle_y 映射。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
10-20.01343True打开0.25x0.5x1xrgb episode gif
21-20.0919True打开0.25x0.5x1xrgb episode gif
32-20.01344True打开0.25x0.5x1xrgb episode gif

Round 1: minimal_ram_v0

2026-05-09T02:33:35.856289+00:00
均分 -20.0
改动
minimal ram policy using ram[54] ball_y and ram[51] paddle_y
样本
seed 0 / 3 局 / 3606 步
分数
-20.0 / -20.0 / -20.0 低/均/高
建议
仍全输;看 RGB 失败片段,并复核 ball_y / paddle_y 映射。

失败记录

时间失败模式证据假设下一步视频
这一轮还没有失败模式记录。

逐局记录

seed分数步数结束RGBRGB 速度RGB 预览OBSOBS 速度OBS 预览
这一轮没有逐局视频记录。