Getting Started with recnn

Colab Version Here (clickable):

https://colab.research.google.com/assets/colab-badge.svg

Offline example is in: RecNN/examples/[Library Basics]/1. Getting Started.ipynb

Let’s do some imports:

import recnn

import recnn
import torch
import torch.nn as nn
from tqdm.auto import tqdm

tqdm.pandas()

from jupyterthemes import jtplot
jtplot.style(theme='grade3')

Environments

Main abstraction of the library for datasets is called environment, similar to how other reinforcement learning libraries name it. This interface is created to provide SARSA like input for your RL Models. When you are working with recommendation env, you have two choices: using static length inputs (say 10 items) or dynamic length time series with sequential encoders (many to one rnn). Static length is provided via FrameEnv, and dynamic length along with sequential state representation encoder is implemented in SeqEnv. Let’s take a look at FrameEnv first:

In order to initialize an env, you need to provide embeddings and ratings directories:

frame_size = 10
batch_size = 25
# embeddgings: https://drive.google.com/open?id=1EQ_zXBR3DKpmJR3jBgLvt-xoOvArGMsL
env = recnn.data.env.FrameEnv('ml20_pca128.pkl','ml-20m/ratings.csv', frame_size, batch_size)

train = env.train_batch()
test = env.train_batch()
state, action, reward, next_state, done = recnn.data.get_base_batch(train, device=torch.device('cpu'))

print(state)

# State
tensor([[  5.4261,  -4.6243,   2.3351,  ...,   3.0000,   4.0000,   1.0000],
    [  6.2052,  -1.8592,  -0.3248,  ...,   4.0000,   1.0000,   4.0000],
    [  3.2902,  -5.0021, -10.7066,  ...,   1.0000,   4.0000,   2.0000],
    ...,
    [  3.0571,  -4.1390,  -2.7344,  ...,   3.0000,  -3.0000,  -1.0000],
    [  0.8177,  -7.0827,  -0.6607,  ...,  -3.0000,  -1.0000,   3.0000],
    [  9.0742,   0.3944,  -6.4801,  ...,  -1.0000,   3.0000,  -1.0000]])

Recommending

Let’s initialize main networks, and recommend something!

value_net  = recnn.nn.Critic(1290, 128, 256, 54e-2)
policy_net = recnn.nn.Actor(1290, 128, 256, 6e-1)

recommendation = policy_net(state)
value = value_net(state, recommendation)
print(recommendation)
print(value)

# Output:

tensor([[ 1.5302, -2.3658,  1.6439,  ...,  0.1297,  2.2236,  2.9672],
    [ 0.8570, -1.3491, -0.3350,  ..., -0.8712,  5.8390,  3.0899],
    [-3.3727, -3.6797, -3.9109,  ...,  3.2436,  1.2161, -1.4018],
    ...,
    [-1.7834, -0.4289,  0.9808,  ..., -2.3487, -5.8386,  3.5981],
    [ 2.3813, -1.9076,  4.3054,  ...,  5.2221,  2.3165, -0.0192],
    [-3.8265,  1.8143, -1.8106,  ...,  3.3988, -3.1845,  0.7432]],
   grad_fn=<AddmmBackward>)
tensor([[-1.0065],
        [ 0.3728],
        [ 2.1063],
        ...,
        [-2.1382],
        [ 0.3330],
        [ 5.4069]], grad_fn=<AddmmBackward>)

Algo classes

Algo is a high level abstraction for an RL algorithm. You need two networks (policy and value) in order to initialize it. Later on you can tweak parameters and stuff in the algo itself.

Important: you can set writer to torch.SummaryWriter and get the debug output Tweak how you want:

ddpg = recnn.nn.DDPG(policy_net, value_net)
print(ddpg.params)
ddpg.params['gamma'] = 0.9
ddpg.params['policy_step'] = 3
ddpg.optimizers['policy_optimizer'] = torch.optim.Adam(ddpg.nets['policy_net'], your_lr)
ddpg.writer = torch.utils.tensorboard.SummaryWriter('./runs')
ddpg = ddpg.to(torch.device('cuda'))

ddpg.loss_layout is also handy, it allows you to see how the loss should look like

# test function
def run_tests():
    batch = next(iter(env.test_dataloader))
    loss = ddpg.update(batch, learn=False)
    return loss

value_net  = recnn.nn.Critic(1290, 128, 256, 54e-2)
policy_net = recnn.nn.Actor(1290, 128, 256, 6e-1)

cuda = torch.device('cuda')
ddpg = recnn.nn.DDPG(policy_net, value_net)
ddpg = ddpg.to(cuda)
plotter = recnn.utils.Plotter(ddpg.loss_layout, [['value', 'policy']],)
ddpg.writer = SummaryWriter(dir='./runs')

from IPython.display import clear_output
import matplotlib.pyplot as plt
%matplotlib inline

plot_every = 50
n_epochs = 2

def learn():
    for epoch in range(n_epochs):
        for batch in tqdm(env.train_dataloader):
            loss = ddpg.update(batch, learn=True)
            plotter.log_losses(loss)
            ddpg.step()
            if ddpg._step % plot_every == 0:
                clear_output(True)
                print('step', ddpg._step)
                test_loss = run_tests()
                plotter.log_losses(test_loss, test=True)
                plotter.plot_loss()
            if ddpg._step > 1000:
                return

learn()

Update Functions

Basically, the Algo class is a high level wrapper around the update function. The code for that is pretty messy, so if you want to check it out, I explained it in the colab notebook linked at the top.