Welcome to the TensorPort community! If you don’t have an account yet, fill out our easy sign-up form to start using TensorPort for free. This guide will show you how to get set up on the TensorPort platform, learn about its basic features, and get to work right away! We'll cover:
- Installing TensorPort
- Uploading an example MNIST model and data to TensorPort
- Training the model with TensorPort
- Viewing and managing your training jobs
After completing the sign-up, you'll end up on the home page of Matrix, your online interface for using the TensorPort platform. Along with Matrix, you will use a command line interface (CLI) in a shell on your local machine to carry out your work on the platform. You can carry out many TensorPort functions in either matrix or the CLI; in this guide, we'll start with the CLI then move on to Matrix.
Before you start using TensorPort, you’ll need to have Git and Git LFS, which we use to version control your models and data in order to ensure that all your experiments are fully reproducible. If you don’t have Git LFS yet, you can use pip (you can also head to the Git and Git LFS documentation pages for more detailed installation instructions):
pip install --upgrade git-lfs
You’ll then need to install the TensorPort CLI using pip. You can install on your system or with virtualenv, but conda is not currently supported (please contact us first if you'd like to use conda).
pip install --upgrade tensorport
Model and Data Upload
Now, log in using your TensorPort account. When prompted, enter your username (not your full email address) and the password you chose.
At any time, just type “tport” to see a list of TensorPort commands. To demonstrate basic TensorPort functions, we’ll use a simple machine learning model that recognizes handwritten characters. To follow along, please first clone our repository containing several model variants and our containing the MNIST dataset of handwritten digits into some directory on your computer:
git clone https://github.com/tensorport/mnist.git
git-lfs clone https://github.com/tensorport/mnist-dataset.git
Within TensorPort you will create Projects and Datasets. A project is a repository of code containing one or more machine learning models, while a dataset is a repository of labeled data; both can be collaboratively added, shared, edited, versioned, and more.
We handle version control of projects with Git. If you're not familiar with it, a quick tutorial is here. A job is run on a previous commit of a project, so before running our model we must push our commit to the TensorPort Git repository.
If you've modified anything about the code you cloned from GitHub, you'll need to commit before creating a project:
git commit -m "My commit message"
Then make the project:
tport create project
Next, push to TensorPort on the master branch of your project:
git push tensorport master
Now go on to repeat the same process with the dataset. We generally recommend that you use Git LFS, which allows for rapid versioning of large files, to manage your datasets. However, this is not necessary in this case since our dataset is quite small.
git commit -m "My commit message" # if you've modified the dataset
tport create dataset
git push tensorport master
We’re now ready to train our model on the dataset. You can train and test your models through Jobs, which are executions of a project on some dataset(s) using TensorPort’s distributed infrastructure of CPUs and GPUs. Multiple datasets can be run with the same project and vice versa, offering maximum flexibility.
We could use "tport create job" from the command line to start the training process, but instead, let's go back to the web browser to start the job so you can get more familiar with Matrix. For many more details about how to use TensorPort from the command line, check out our other tutorials.
Training the Model
To open Matrix, return to your web browser then click the green “LOGIN” button in top right of this page or follow the link in your invitation email. (If the green button at top of your page reads "Matrix" than you are already logged in.) We do recommend that you use Google Chrome to access Matrix for best performance.
In Matrix, you'll see a toggle in the top left that allows you to switch between managing your projects and datasets. Try clicking back and forth between the two to see the project and dataset you just created. Now click on the project's name in the sidebar to open it in the main workspace. Go ahead and click the "Create Job" button near the top of the workspace.
Follow the prompts to set up the job. First, choose the latest version of the model we uploaded from the list of commits — if you modified the code and added a commit previously you'll see it here. TensorPort allows you to easily choose a commit in order to run tests on multiple versions of a project. Add a job name and specify the module as “mnist” — this allows us to name the primary script in our project to run. Leave the package path empty (for more details on these options, see the documentation) and click Next.
Add the mnist dataset and select the "Add MNIST files" commit, then click Next. Set the requirements file to “requirements.txt”; if needed, you could further customize the environment for running your code here. Then choose 1.0.0 for the TensorFlow version to run the code on and click Next to continue to Resources.
TensorPort makes distributed training easy: all you have to do is adjust the number of CPU and GPU workers assigned to your project. For this simple test, however, there's no need to waste any of your GPU hours on distributed training, so select single-node training and choose the t2.small CPU worker for the instance type. Finally, set the time that you want your job to run for as 1 or 2 hours — again, we don't want to waste your trial :) The job will compute for that long then automatically stop, although you can pause and restart the job at any time prior to the time limit. Now you can continue through Metadata, adding a description or search tags if you'd like, and click "Create Job".
You should now see your job listed in the workspace of your project. Click the "Start" button in the bottom left and your job will begin to execute on TensorPort. That’s it! Training and testing models on TensorPort is that simple.
Let's take a look at what's happening as you run your job. Click "See Details" in the bottom left to open more information about the job. Under the "Events" tab, you'll see TensorPort's progress in setting up your job's worker(s) and cloning your code and data to run. The "Outputs" tab will begin to produce logs from your code as your model trains.
For even better visualization, we can take a look at TensorBoard. Click "Add to TensorBoard" in the bottom right of the job tab. Now, click on the "TensorBoard" button in the top bar menu. This will open a new tab with TensorBoard, Google's TensorFlow visualization tool. If you just started your job and it is still setting up, it will take a bit of time for results to load into TensorBoard. So, wait for a few minutes and then check back to see your model's graph, loss function, and other visualizations in real time as it trains.
In the meantime, play around with some of Matrix's other features: sharing your project or job with friends (they'll get an invite to TensorPort if they don't have an account yet), looking at your notifications, or creating new projects, datasets, and jobs from Matrix. You can also return to the CLI to upload your own models and data. Try just typing "tport" to see a list of all available CLI commands. Start working on your own projects, invite your friends, and put those free GPU hours to good use — good luck!
If you'd like to keep learning how to use TensorPort, you can continue to our other tutorials, or reference our detailed documentation. Please feel free to reach out to us if you have any questions or suggestions, or if you would like to request a new feature.