0 votes
asked ago in General Economics Questions by (230 points)
I built a small deterministic gridworld benchmark to isolate a specific decision behavior: selecting between viable policies where standard scalar or constrained formulations select differently.

The example is intentionally minimal (fixed policies, no learning, two domains: energy and safety).

In the benchmark, both policies reach the goal and remain viable, but one preserves safety while the other reduces it from 10 → 4. The PV–PP system selects the former; a scalar baseline selects the latter.

Known limitations:
    •    selection rule is partially benchmark-specific
    •    no general trade-off mechanism yet

GitHub: https://github.com/AmundsenLance/pvpp-gridworld-safe-benchmark.git

I’m looking for critique on whether this behavior can always be represented as a scalar or lexicographic ordering, or whether it requires a genuinely non-scalar selection rule.

Please log in or register to answer this question.

...