Commit bb7d426f authored by lli
corrected output neurons in reinforce

parent bcd483ec
def __init__(self, n_state, n_hidden, n_action, lr):
Initialize the policy neural network:
Use one hidden layer
Input: a state, followed by a hidden layer
Output: the probability of taking possible individual actions
use softmax function as the activation for the output layer
nn.Linear(n_hidden, n_hidden),
nn.Linear(n_hidden, n_hidden),
nn.Linear(n_hidden, n_action),
