An advertising platform like Bing might be losing money because bidders are not bidding truthfully, i.e. according to their valuation of an item. Since the valuation is private, itâs impossible to know for sure. On the other hand, since the valuation is constrained by a multi-agent environment with repeated games (i.e. auctions) - hence an opportunity to learn - it should be possible to infer valuation under some assumptions on agent/player behavior.
A previous paper by the first author, Denis Nekipelov, discussed player valuation under the assumption of a static Nash equilibrium where all player respond independently and with their best strategies. That the auctions are already and/or constantly in equilibrium is a strong - if popular - assumption in dynamic environments like online advertising. The current paper instead assumes that player learn over time using âno-regret strategiesâ (i.e. responses that over time are close to optimal), and then sets out to estimate the number of data samples required to approximate the rationalizable set âconsisting of the set of values and error parameters (v, Δ) such that with value v the sequence of bid by the player would have at most Δ regret.â