Featured

Differentiable Neural Computers

The Differentiable Neural Computer (DNC) is a neural network that, loosely speaking, separates computation from memory.  Often the best way to understand is to read the code directly – otherwise, the following should serve as a gentle introduction.

Motivation

LSTMs or RNNs have their memory tied up in the activations of neurons and these typically account for a small proportion of the total number of parameters. For a number of tasks, we want to be able to scale memory without having to scale the whole rest of the network – enter differentiable memory structures.

The Differentiable Neural Computer is so called because it attempts to augment a neural net with a memory bank such that computations can proceed under a fully differentiable analog of the Von Neumann architecture complete with memory allocation and deallocation.
To push the analogy further: LSTMs are a simple CPU with their activations representing the contents of registers; however, the DNC is a CPU (called the ‘controller’) with a separate and independently scalable differentiable form of RAM (called the ‘memory’).

So, when is this approach useful? Well, many sequential tasks have implicit structure in them which is best exploited by operations on particular data structures. So, if we can learn to store the right data structures themselves and learn to operate on them in the right way, then these tasks become easy. It’s the same in normal programming: we choose the right data structure and right algorithm to deal with problems; in doing so we try to select the most desirable tradeoff between computation and memory. When we introduce RNNs with external memory, we can use backprop to learn that right tradeoff.

The problem is that it’s difficult to learn that tradeoff if we can’t allocate and deallocate memory more explicitly. The more we have a mechanism to do that, the more we can learn when to offload the right stuff to memory and when to perform a particular action on that memory. One of the key aspects of the DNC is its ability to learn to manage memory in that same way.

How it works

In practice, we don’t want to have to interact with the whole memory at once – this motivates the use of attention. The way you weight each row in the memory matrix is the key to good performance in these types of architectures.

In the DNC the read/write weightings come from 3 core attention-based ideas: content, memory allocation and temporal order of memory interactions.

1) Content: find memories closest to the key (cosine distance).

2) Memory allocation: we now have memory management problems. And we don’t want to be tied to using contiguous blocks (by using indices into the memory matrix). So the DNC maintains a list of free memory and its usage and then gives you the choice: update or write somewhere new.
For example, once it’s read something and used it, it can learn to free up that memory slot.

3) Temporal order: we want to be able to iterate through memories in the order that they were written. This is an important prior for some computational tasks where the desired solution requires reading and writing large amounts of data in sequential order. We use some helper variables in the network to help calculate how to use the idea of temporal ordering to come up with a weighting. For example, the ‘precedence weighting’ keeps track of the location of that was just written to.

For more in-depth information, please see the comments provided in the implementation.

Research patterns for machine learning

These are some helpful patterns for creating what I have found is a robust machine learning research workflow. The word “pattern” is used here very loosely to refer to “general approaches that work well”.

This post is based on this excellent article; credit goes to the original author for many of the ideas.

I’ve summarised the above article in part and added some of my own thoughts (in no particular order):

  1. Always keep something running
    Make the most of the computational resources you have and always have something running in the background. Even if you think it’s stupid it could tell you something interesting. Keep the queue (if you have one) filled with fodder whilst developing. Never leave the queue empty.
  2. Always use version control
    Standard software engineering best-practices should apply and version control is no exception. Regular commits are doubly important here because in ML research reproducibility is paramount.

    Additional thoughts:
    * Tag code with “releases” for each project so you can roll back with confidence to a point where a certain amount of functionality was implemented. Allows experiments to be rerun from different points of the code’s evolution.
    * If it can be regenerated easily, don’t commit it. e.g large models or experiment results

  3. Separate code from data
    This is a must. Data generated during experiments (and with dependencies on a particular experiment) should be stored separately. Keep the codebase unpolluted. Things which stay the same across lots of experiments, e.g preprocessed features, should also be moved to a permanent location.Setup so that data can be swapped in and out without dependencies breaking.In fact, each stage in the pipeline should be isolated with clearly defined inputs and outputs so that any one stage can be independently optimised.
  4. Save everything to disk frequently
    That means during training – and particularly for long runs – model parameters get saved frequently to disk. Ideally, dump to disk at time intervals that mean you’re only just comfortable with hardware failure occurring at any moment. That might mean each model iteration. It might mean every 30 mins.
  5. Save with sensible names
    Runs on different data sets and different parameter settings should reflect those differences. Prepend a date to each folder (things appear nicely in date order if you use (YYYYMMDD will allow it to be correctly sorted!). Include the name of the group of experiments. Example:

    20141021adadeltaTest_alpha0.9_mbSize52

     

  6. Make experiments reproducible
    That means storing config files containing hyperparameters, parameters, links to the datasets used and experiment-specific info. Ensure failed runs can be restarted halfway through. This is a lower priority item but can be useful once an idea has been validated and you start running longer experiments. My best advice here: copy (yes copy!) your entire codebase alongside each experiment. That might sound overkill but it helps greatly with reproducibility – knowing that a trained model always has the exact code that produced it and that can run it next to it gives great peace of mind. Logging the git-sha isn’t enough – you will at some point run with uncommitted modifications and you will be caught out.
  7. Don’t over-automate
    There’s a temptation to script everything to death but during research, you need results to test ideas as fast as possible. Script and automate as much as you can whilst waiting for results, otherwise try not to waste too much time making things perfect.
  8. Automate parameter sweeps
    If you do automate something, this one is probably worth it. Make it really simple to quickly launch lots of jobs over a range of parameters. Some frameworks already support this. Bash is great for this. My rule of thumb: if it can fit on one screen it’s fine in bash, any longer then consider python.
  9. Don’t reinvent the wheel
    Reuse code where possible. Correctness is hard to come by in ML and often is a product of sweat and tears. Use someone else’s sweat and tears. Prefer libraries that allow you to test your ideas as fast as possible.
  10. Keep a log of experiments
    Each folder should contain a log entry for why you ran the particular experiment.Results should not be buried deep inside multiple log files but in one single file which is easy to read. It should also contain runtime info such as:* date the experiment started
    * which machine it ran on
    * which config files were used
    * how far through the experiment currently is
    * what the intermediate results are
  11. Kill bad runs early
    Make sure to catch bad runs quickly, either by hand or automatically so you can tighten the debug/test feedback loop as much as possible. If optimising some objective, observe if there is a high correlation between poor final performance and poor performance in the first 5%. If there is you have a good signal to use to kill runs early. This is often the case with overparameterised neural networks.
  12. Separate options from parameters
    Make sure algorithm parameters (e.g working directories, hyperparameters) are separate from model parameters (e.g binaries which contain model weights).
  13. Completely eliminate sources of variation in the environment
    Nip this one in the bud. Always remove non-determinism in code in the first instance. Once we have guaranteed correctness then determinism may be traded off for improved convergence speed (i.e through distributed multi-GPU based training). For example, in Tensorflow, that means forcing these options at the outset:

    tf.logging.set_verbosity(3)  # Print INFO log messages.
    np.random.seed(1)
    random.seed(1)
    tf.reset_default_graph()
    ...
    deterministic_config = tf.ConfigProto(inter_op_parallelism_threads=1,
                                          intra_op_parallelism_threads=1)
    with tf.Session(config=deterministic_config) as sess:
         #train

    If you can, pin all dependencies and run everything inside docker. That can guarantee the environment won’t change under your feet.

Installing Jelly Bean on HTC Desire HD

If you want to upgrade and refresh your phone then flashing Jellytime onto it can make a big difference. Here’s how to do it:

(Note: Make sure you are using a rooted phone and have backed up everything *not* on your SD card)

(1) Install ROM Manager

https://play.google.com/store/apps/details?id=com.koushikdutta.rommanager&hl=en

(2) Flash latest ClockworkMod Recovery

ROM Manager -> Install ClockworkMod Recovery

(3) Download the latest Jellytime onto your phone

http://code.google.com/p/jellytime/downloads/list

(Latest as of Jan 2013 here)

(4) Download the latest Google Apps package

http://goo.im/gapps

(Latest as of Jan 2013 here)

(5) Prepare to take the dunk:

- ROM Manager -> "Reboot into recovery"
- Install from zip
- Navigate to jellytime and ok it
- Wait till installed and do the same for the Google apps package

5) Restart and enjoy!


Potential problems:
– If you use an older version of the Google Apps package you will get an “Unfortunately, setup wizard has stopped working” and you can’t get into your phone. Just make sure you’re using the most recent versions of the above two downloads and you should be ok.

SSH with no password and SSH tunneling

Option 1: SSH without needing your password

1 ) You are A. You want to SSH to B.

2) Generate SSH keys on A. *Don’t* type a password, leave blank when prompted.

a@A:~> ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/a/.ssh/id_rsa):
Created directory '/home/a/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/a/.ssh/id_rsa.
Your public key has been saved in /home/a/.ssh/id_rsa.pub.
The key fingerprint is:
3e:4f:05:79:3a:9f:96:7c:3b:ad:e9:58:37:bc:37:e4 a@A

3) Create a directory ~/.ssh as user b on B if one doesn’t exist.

a@A:~> ssh b@B mkdir -p ~/.ssh
b@B's password:

4) Append a’s new public key to b@B:.ssh/authorized_keys and enter b’s password one last time:

a@A:~> cat ~/.ssh/id_rsa.pub | ssh b@B 'cat >> .ssh/authorized_keys'
b@B's password:

5) Now you need no password.

a@A:~> ssh b@B hostname

Option 2: SSH without needing your password

$ ssh-agent bash
$ ssh-add ~/.ssh/id_rsa
$ ssh username@remote

Tunnelling through multiple logins is really easy but you have to do the above for all the hosts that you want to automatically hop over.

Sort out your intermediate hosts so you can auto-ssh into them as above. Then run:

$ ssh -t user@intermediateserver.com ssh user@finaldestination.com

 

OpenCL Error Codes

Hope this helps. Straight from the header file:

/* Error Codes */

#define CL_SUCCESS 0
#define CL_DEVICE_NOT_FOUND -1
#define CL_DEVICE_NOT_AVAILABLE -2
#define CL_COMPILER_NOT_AVAILABLE -3
#define CL_MEM_OBJECT_ALLOCATION_FAILURE -4
#define CL_OUT_OF_RESOURCES -5
#define CL_OUT_OF_HOST_MEMORY -6
#define CL_PROFILING_INFO_NOT_AVAILABLE -7
#define CL_MEM_COPY_OVERLAP -8
#define CL_IMAGE_FORMAT_MISMATCH -9
#define CL_IMAGE_FORMAT_NOT_SUPPORTED -10
#define CL_BUILD_PROGRAM_FAILURE -11
#define CL_MAP_FAILURE -12

#define CL_INVALID_VALUE -30
#define CL_INVALID_DEVICE_TYPE -31
#define CL_INVALID_PLATFORM -32
#define CL_INVALID_DEVICE -33
#define CL_INVALID_CONTEXT -34
#define CL_INVALID_QUEUE_PROPERTIES -35
#define CL_INVALID_COMMAND_QUEUE -36
#define CL_INVALID_HOST_PTR -37
#define CL_INVALID_MEM_OBJECT -38
#define CL_INVALID_IMAGE_FORMAT_DESCRIPTOR -39
#define CL_INVALID_IMAGE_SIZE -40
#define CL_INVALID_SAMPLER -41
#define CL_INVALID_BINARY -42
#define CL_INVALID_BUILD_OPTIONS -43
#define CL_INVALID_PROGRAM -44
#define CL_INVALID_PROGRAM_EXECUTABLE -45
#define CL_INVALID_KERNEL_NAME -46
#define CL_INVALID_KERNEL_DEFINITION -47
#define CL_INVALID_KERNEL -48
#define CL_INVALID_ARG_INDEX -49
#define CL_INVALID_ARG_VALUE -50
#define CL_INVALID_ARG_SIZE -51
#define CL_INVALID_KERNEL_ARGS -52
#define CL_INVALID_WORK_DIMENSION -53
#define CL_INVALID_WORK_GROUP_SIZE -54
#define CL_INVALID_WORK_ITEM_SIZE -55
#define CL_INVALID_GLOBAL_OFFSET -56
#define CL_INVALID_EVENT_WAIT_LIST -57
#define CL_INVALID_EVENT -58
#define CL_INVALID_OPERATION -59
#define CL_INVALID_GL_OBJECT -60
#define CL_INVALID_BUFFER_SIZE -61
#define CL_INVALID_MIP_LEVEL -62
#define CL_INVALID_GLOBAL_WORK_SIZE -63