174
правки
Изменения
м
→Aлгоритм Q-learning
'''while''' Q is not converged:
s = <tex> \forall s \in S</tex>
'''while''' s is not terminatedterminal:
<tex>\pi(s) = argmax_{a}{Q(s, a)}</tex>
a = <tex>\pi(s)</tex>
s = s'
return Q
== Ссылки ==