- Sequential Decision Problems
- Partially observable MDPs
- Decision-Theoretic Agents
- Decisions with Multiple Agents: Game Theory
- Mechanism Design
A strategy is a plan of action designed to achieve a particular goal, and strategy discovery can be thought of as a subset of strategy optimization since in theory, we could start from a trivial, arbitrary, possibly bad strategy and somehow optimize it until we have "discovered" a satisfactory strategy. Perhaps "strategy design" would more generally describe what I'm talking about, but I prefer to term "optimization" for all the sentiments it evokes.
We have already seen how we can use reinforcement learning methods to come up with satisfactory strategies for controllers in settings such as the inverted pendulum problem. In that context, we called the strategy a policy and defined the optimal policy to be a mapping from states to actions that would maximize the controller's expected return from the environment. What other kinds of scenarios may be these concepts and methods be useful for? Consider the following 2 examples:
- Legend has it that Nathan Rothschild, a London Banker in the 19th century, acted cleverly on early news of the outcome of the battle of Waterloo and made a fortune. Consols would rise as a result of Napoleon's defeat, but rather than buying Consols anticipating the increase, Nathan sold his existing holdings. Other investors, unaware of the outcome in Waterloo, got wind of this and assumed Nathan was acting on information that the battle was lost. Nathan was known at the time to have many connections and was expected to have access to early information, so these other investors also sold their holdings, driving the prices further down. At the proverbial last minute, Nathan bought at bargain prices. When news of the battle came out, the prices soared, and Nathan realized a nice profit.
- A more modern story is that of how Porsche cornered the market in VW stock and made a killing in 2008. You can read about it here, here, and here.
The strategy optimization process can indeed be automated, and a framework for doing so draws from fields such as decision theory, game theory, system dynamics, and control theory, among others, making heavy use of modeling and simulation, not unlike our inverted pendulum example. Take the London Banker, for instance, who if ambivalent about the outcomes resulting from initially selling or buying, could run simulations of each strategy to determine the most profitable one. Of course, the success of this exercise would primarily depend on the quality of the models for each independent system component and their interrelations, some of which may include a model for information flow, a model for Nathan and traders other than Nathan, and a model of the "physics" of the environment (i.e. what actions may or may not be taken by market participants at any given time).
And therein lies the rub: it is very hard to model complex systems, specially when it involves modeling human behavior. Nevertheless, there exist methods to ease the burden and that is something I expect to dive into deeper in future posts.
0 comments:
Post a Comment