HomeGuidesAPI ReferenceChangelog
Log In
Guides

Model Download

Model Download

Downloading the Model

This guide explains how to download a recommended model from HuggingFace, a widely used model repository, using our model downloader script. Contact your Solutions Engineer for the download script.

Model Selection

We currently recommend Llama 3.2 11B as our preferred small language model. The 11B parameter model provides a good balance of function and compute resources. Phi-4-mini-instruct is a smaller model at 3.8B parameters, but is less consistent in its results. See our Resource Configuration section for more details on how we handle models and quantization under different compute resource scenarios.

Prerequisites

Before you begin, ensure you have:

  1. Python 3.6 or later installed
  2. The required libraries:
    • transformers
    • sentence-transformers
    • huggingface_hub
    • torch (required by transformers)

You can install these dependencies with:

pip install transformers sentence-transformers huggingface_hub torch

Step 1: Create a Hugging Face Account

If you don't already have one, create an account at Hugging Face.

Step 2: Accept the Model License

The Meta-Llama-3.2-11B model requires you to accept its license agreement before downloading:

  1. Visit the model page: <https://huggingface.co/meta-llama/Meta-Llama-3.2-11B>
  2. Click on "Access repository"
  3. Review and accept the license terms
  4. You may need to provide additional information requested by Meta.
  5. An access granted notification is usually received by email within 24 hours, but could be longer. You will not have to apply again once access is granted the first time.

Step 3: Get Your Hugging Face Token

Note: This step is only necessary for gated models like Meta-Llama-3.2-11B. For public models like Microsoft/Phi-4, you can skip this step.

  1. Go to https://huggingface.co/settings/tokens
  2. Click "New token"
  3. Give your token a name (e.g., "LlamaDownload")
  4. Select "Read" role (minimum required permission)
  5. Click "Generate token"
  6. Copy the token value - you'll need it for the download process

Step 4: Download the Model

For Meta-Llama-3.2-11B (requires authentication):

Run the model downloader script with your Hugging Face token:

python model_downloader.py "meta-llama/Meta-Llama-3.2-11B" "/path/to/save/directory" --token YOUR_HF_TOKEN

  • /path/to/save/directory with your desired save location
  • YOUR_HF_TOKEN with the token you copied in Step 3

For Public Models (no authentication required):

Run the model downloader script without the token argument:

python model_downloader.py "microsoft/Phi-4-mini-instruct" "/path/to/save/directory"


Expected Output

The script will:

  1. Log in to Hugging Face with your token
  2. Download the model (this is a large model, approximately 25GB, so it may take some time)
  3. Save the model to the specified directory

Example terminal output:

Logged in to Hugging Face with the provided token.
Downloading and saving language model: meta-llama/Meta-Llama-3.2-11B
Downloading model meta-llama/Meta-Llama-3.2-11B...
Saving model to /path/to/save/directory/meta-llama/Meta-Llama-3.2-11B/model
Saving tokenizer to /path/to/save/directory/meta-llama/Meta-Llama-3.2-11B/tokenizer
Model successfully downloaded and saved to /path/to/save/directory/meta-llama/Meta-Llama-3.2-11B

Troubleshooting

Authentication Issues

If you receive an authentication error:

  • Verify that you've accepted the model license on the Hugging Face website
  • Check that your token is correct and has not expired
  • Ensure your account has been approved for access to Llama models

Download Issues

If the download fails:

  • Check your internet connection
  • Ensure you have enough disk space (at least 40GB recommended)
  • Try running the script again, as it may resume downloading

Script Errors

If the script fails with errors:

  • Make sure all required libraries are installed
  • Check that you're using a compatible Python version
  • Verify the script is using the correct import statements

Next Steps

After downloading, you can use the model with the RegScale AI Inference Server.

Hardware Requirements

Note that Meta-Llama-3.2-11B is a large model that requires significant computational resources:

  • At least 22GB of VRAM for full precision (32-bit)
  • At least 11GB of VRAM with half precision (16-bit)
  • A form of quantization will be automatically applied for testing in more resource-constrained environments, but this is not recommended. See our Resource Configuration page for more details.

Did this page help you?