Transformers: Quick Start¶
In this tutorial, we are going to deploy a language model to Model Zoo with HuggingFace Transformers and use it to generate an original passage of text.
You can follow along this tutorial in any Python environment you’re comfortable with, such as a Python IDE, Jupyter notebook, or a Python terminal. The easiest option is to open this tutorial directly in colab:
Install the Model Zoo client library via pip:
!pip install modelzoo-client[transformers]
To deploy and use your own models, you’ll need to create an account and configure an API key. You can do so from the command line:
First, we’ll need to train or load a model and tokenizer in the form of a transformers.Pipeline object. Pipeline offers a simple API dedicated to several tasks, including Named Entity Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. Model Zoo relies on this API to figure out how to package and serve your model behind an HTTP endpoint.
For the sake of this quickstart, we’ll use the default “text-generation” pipeline which loads the OpenAI GPT-2 model and tokenizer.
import transformers pipeline = transformers.pipeline("text-generation")
To deploy this pipeline to a production-ready HTTP endpoint, use the
function. Since GPT2 is a large model with high memory requirements, we
override defaults to configure our containers to use 2 GB memory and 1024 CPU
units (1 vCPU) with
import modelzoo.transformers model_name = modelzoo.transformers.deploy( pipeline=pipeline, resources_config=modelzoo.ResourcesConfig(memory_mb=2048, cpu_units=1024), )
That’s all there is to it! Behind the scenes, Model Zoo serialized your model, uploaded it to object storage, deployed a container to serve any HTTP requests made to the model, and set up a load balancer to route requests to multiple model shards. If you’d like, take some time to explore the model via the Web UI link. There you’ll be able to modify documentation, test the model with raw or visual inputs, monitor metrics and/or logs. By default, only your account (or anybody you share your API key with) will be able to access this model.
You can specify the name of the model you’d like to deploy via a
argument. If a name is omitted, Model Zoo will choose a unique one for you.
Model names must be unique to your account.
Now that the model is deployed, you can use our Python client API to query
the language model to generate some text.
function requires the
model_name and an optional input string to prime the
text generation pipeline.
print( modelzoo.transformers.generate( model_name, input_str="These violent delights have violent ends" ) )
Great! At this point, we’ve used our language model to generate an original passage of text.
By default, Model Zoo will deploy your model and wait for it to get into a
HEALTHY state, meaning that it’s ready for predictions. You can always
check on the state of a model by using the
To save resources and shut down any model if you aren’t using it, you can use
With Model Zoo you can manage model state manually, or automatically. By default, our free trial will stop any model where there has been no request activity for 15 minutes, saving you resources if you forget to stop manually. Our unlimited version has more options for controlling autoscaling behavior.