Review: Google’s Machine Learning Crash Course

As I mentioned in an earlier post, I’ve spent the last few weeks working on Google’s Machine Learning Crash Course.  After several mornings and weekend hours of work, I managed to finish all the lessons.  In Google’s own words, the crash course is “A self-study guide for aspiring machine learning practitioners.”  From a workflow perspective, the course is broken up into 25 lessons, each of which has at least one power-point style lecture from Google researchers, as well as a combined 40+ exercises.  They also try to use real-world case studies as examples, which helps make the course material feel a bit less abstract.

The website lists an expected completion time of 15 hours, but I’d say 20-30 hours is probably a bit more realistic if you’re not already a Machine Learning expert, not counting studying prerequisites of course.  Still, this is probably one of the most compact machine learning resources I’ve seen.  It puts almost everything you need in your web browser and gives a good balance between too much information and not enough.  It glosses over some of the more technical and involved details, such as detailed mathematical proofs, but makes sure you have enough intuition to work through the logic and understand what you’re doing.

I have an undergraduate level of knowledge about statistics, and I work with Python on a daily basis, so I figured I had the prerequisites covered.  The prerequisites  for the course specifically are…

Course prerequisites :

  1. Mastery of intro-level algebra
    • Mostly check, I am not afraid of matrices, vectors, and I’ve even been known to dabble in identities.  I got a D the first time I took vector calculus, but that was mostly due to being sick and missing the first couple of weeks of classes–I took it again and got an A at least.  I’ve also been working through a great MIT online course Probabilistic Systems Analysis and Applied Probability.  That turned out to be a great refresher for a lot of the basic ideas addressed in the crash course.
  2. Proficiency in programming basics, and some experience coding in Python

…There’s a lot of good information on the linked prerequisites page and it’s a great place to start if you need to brush up on a few concepts.  I especially liked the visual explanation of the back propagation algorithm.

Once you start the actual course, you are presented with a list of lessons covering fields like “Reducing Loss“, “Classification“, and “Training Neural Nets“.  Each lesson has a expected completion time.  I found that these time estimations are a bit low overall compared to the time it took me to finish them, but it varies a good deal lesson to lesson.  Sometimes exercises will require additional time just to train the machine learning models for instance and I felt that extra time wasn’t really factored in.

The video lessons are probably the most unexpectedly well done part of the whole crash course.  They do a good job of introducing basic concepts, and they’re paced well to boot.  Sometimes I’d need to go back and replay a certain “slide”, but doing so was made easy by the interface.  It also has an option to play the slides at 1.5x or 2.0x speed, if that works better for you.

The exercises are very useful and interactive.  Each exercise takes the form of a Jupyter notebook, hosted via the Colaboratory Google research project.  They hold your hand enough that you don’t get too lost, but also have hidden solution sections you can reveal if you’re stumped or just want to double check your work.

I feel that the course gave me a more thorough understanding of the basic principles of machine learning and gave me a solid foundation to work from.  The introductions to Tensorflow, NumPy, and Pandas are probably the most useful gems in the crash course.  Together the provided Python tools make for a very powerful and flexible machine learning toolbox.

To wrap everything up, I definitely recommend the Google Machine Learning Crash Course as long as you’re not looking for a single source to teach you everything about the field.  And, I suppose, to expect that would be overlooking the “crash course” part.  Instead, this is a first step that will point you in the right direction to learn more.  But it serves that purpose very well.  Well done to the folks at Google!

Great new machine learning crash course from Google

Over the last couple of weeks, I’ve been working through a new Google online Machine Learning Crash Course.  I’ve worked through several tutorials on basic machine learning tools in the past, but this one is by far the most easy to use such tutorial I’ve found.  It uses Jupyter Notebooks, similar to my previous post detailing my machine learning homelab.  However, everything runs directly in the browser, requiring no additional setup to run the notebooks.  By default, these notebooks do not include GPU acceleration, but when you run the same notebooks in my homelab environment, they should automatically become GPU accelerated.  Alternatively, with a bit of tweaking, you can even use GPU accelleration directly in the browser.  NEAT!!!

It’s really nice to see more folks jumping on the Jupyter Notebooks bandwagon these days.  They’re easy to manage and at least somewhat portable.  You can find even more such notebooks at https://distill.pub/ .

Update: Fixed incorrect assertion that you can’t to GPU acceleration with Tensorflow in-browser.

Building a machine learning homelab (w/ Docker + Linux + Nvidia 1080 GPU)

Lessons Learned

Before I begin, I’ll start with a bit of brief background on how and why I developed my current machine learning homelab.  I’ve spent much of my professional career working with technology on the cutting edge of what’s possible with modern machine learning.  My personal background is more on the web development and back end infrastructure side of things, so I’ve helped monitor and improve the general reliability and tractability of lots of different software including machine learning models.

However, sometimes I just want to noodle around (or should I say dabble?) with various machine learning models on my own at home–without worrying about nuking a production system.  So, I designed and built the homelab system described below as a simple, easy to manage machine learning test bed using my existing home gaming desktop.  I went through a lot of trial and error to get to this final design, but I’ll just present the final product as I’m currently using it.  Please drop me an email or leave a comment if you find any bugs or have any suggestions on how I can improve my design.

Prerequisites

Part of what makes this system so easy to manage is the integration between the Docker stack and the Linux kernel.  As such, Linux is required, and non-Ubuntu Linux environments may behave somewhat differently depending on the distribution.  For this tutorial, I’ll assume the following environment to start…

  1. Ubuntu 14.04+
  2. Attached modern Nvidia GPU (I’m using a Nvidia 1080 GTX)
    • Note: AMD cards will likely not work for GPU acceleration (the same is true for systems with no GPU of course), but that should  not cause problems in CPU-only mode as described below.
  3. Docker installed
  4. Recent Nvidia drivers installed
  5. Git installed

Setup nvidia-docker

(Skip this section for CPU-only mode)

Once you’ve got your system ready, it’s time to install the magic package that will make our Nvidia GPU integration work.  It’s called “nvidia-docker” and allows us to run docker containers that will automatically connect to your local GPU(s) and make them available to the container software…

Install: https://github.com/NVIDIA/nvidia-docker#quickstart

Instructions

Please see the above linked documents for detailed instructions on setting up your environment (especially the apt settings).  However, the following will work for many users as a quick-and-dirty setup process, assuming you already have some Nvidia drivers and Docker installed.

1. Get Docker

(Skip this section if you’ve already got a recent Docker installation)

$ sudo apt-get install docker-ce

2. Install Nvidia drivers

(Skip this section if you’ve already got recent Nvidia drivers for your GPU)

Install a compatible nvidia driver package on your local system.

$ sudo apt-get install nvidia-384

3. Get nvidia-docker

Get sources for the latest nvidia-docker package.

$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - 
$ curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list 
$ sudo apt-get update

Install the latest nvidia-docker package.

$ sudo apt-get install nvidia-docker2

Reload the Docker daemon configuration.

$ sudo pkill -SIGHUP dockerd

4. Setup Jupyter Notebooks:

Change to your home directory, or some other directory where you’d like the keras package to live.  Keras is a machine learning package with a built-in Jupyter Notebooks environment that we will leverage.

$ cd $HOME

Get the keras code base using git.

$ git clone https://github.com/dannydabbles/keras.git

Change directories to the keras folder.

$ cd keras

Set up git submodules.

$ git submodule init
$ git submodule update

Change directories to the docker folder.

$ cd docker

Launch the Jupyter Notebook environment.  Make sure to note the notebook URL link in the terminal output from this command.

(Leave out “GPU=0” for CPU-only mode)

$ make notebook BACKEND=tensorflow GPU=0 # Note URL in output

5. Point your favorite web browser at the “0.0.0.0:8888…” URL you just generated

Open this URL in your web browser of choice and you’ll now have access to a persistent environment that you can use to run machine learning models.  The default setup is running a tensorflow environment, but many related tools will also work in this environment.  You may also want to try out the theano flavor of the environment by switching the backend option to “BACKEND=theano”.

6. From the Jupyter Notebooks landing page, navigate to workspace>examples>deep-learning-keras-tensorflow

Run the example notebook “0. Preamble.ipynb” by clicking on it, then clicking Cell>Run All

In the deep-learning-keras-tensorflow folder, you will find several Jupyter Notebooks (all ending in “*.ipynb”) that will run using your attached GPU.  There are many other notebooks you can download and play with without GPU support as well.  Have a look around and see what you can find.

NOTE: Any data not stored under the “workspace” directory on the Jupyter Notebooks landing page will not persist once you stop your Jupyter Notebooks Keras container.

Conclusions

A good chunk of getting modern machine learning models to work is just setting up the proper infrastructure.  I’ve found other guides online either overly verbose or incomplete as far as how to set up modern machine learning infrastructure in a homelab setting.  Hopefully I’ve helped fill that gap somewhat here.  Please comment below or shoot me an email if you find bugs or have general comments on my post.

Hello world!

Hello and welcome to my dabbling blog!  As the name suggests, this is a place for me to dabble with whatever grabs my interest and share my findings with the world.  I do not yet have any idea whether or not this will turn out to be more of a solitary portfolio or a larger conversation about my interests.  I hope the latter and I’m always honored by thoughtful commentary.  I do not make any claims of special skills or prowess, I simply hope you can share in my curiosity and maybe take part.  I will endeavor to share as much of my projects and code as I can, but I don’t make any guarantees of quality and I can’t take responsibility for anything that may break for you (user beware!).  With those caveats, I’ll be posting more soon but, for now, I’ll just leave this simple hello.  Welcome!