| Article ID: | iaor19982955 |
| Country: | Germany |
| Volume: | 45 |
| Issue: | 2 |
| Start Page Number: | 265 |
| End Page Number: | 280 |
| Publication Date: | Jan 1997 |
| Journal: | Mathematical Methods of Operations Research (Heidelberg) |
| Authors: | Yushkevich Alexander A., Donchev D.S. |
| Keywords: | control processes |
A symmetric Poissonian two-armed bandit becomes, in terms of a posteriori probabilities, a piecewise deterministic Markov decision process. For the case of the switching arms, only one of which creates rewards, we solve explicitly the average optimality equation and prove that a myopic policy is average optimal.