Model Download
Model Download
Downloading the Model
This guide explains how to download a recommended model from HuggingFace, a widely used model repository, using our model downloader script. Contact your Solutions Engineer for the download script.
Model Selection
We currently recommend Llama 3.2 11B as our preferred small language model. The 11B parameter model provides a good balance of function and compute resources. Phi-4-mini-instruct is a smaller model at 3.8B parameters, but is less consistent in its results. See our Resource Configuration section for more details on how we handle models and quantization under different compute resource scenarios.
Prerequisites
Before you begin, ensure you have:
- Python 3.6 or later installed
- The required libraries:
transformers
sentence-transformers
huggingface_hub
torch
(required by transformers)
You can install these dependencies with:
pip install transformers sentence-transformers huggingface_hub torch
Step 1: Create a Hugging Face Account
If you don't already have one, create an account at Hugging Face.
Step 2: Accept the Model License
The Meta-Llama-3.2-11B model requires you to accept its license agreement before downloading:
- Visit the model page: <https://huggingface.co/meta-llama/Meta-Llama-3.2-11B>
- Click on "Access repository"
- Review and accept the license terms
- You may need to provide additional information requested by Meta.
- An access granted notification is usually received by email within 24 hours, but could be longer. You will not have to apply again once access is granted the first time.
Step 3: Get Your Hugging Face Token
Note: This step is only necessary for gated models like Meta-Llama-3.2-11B. For public models like Microsoft/Phi-4, you can skip this step.
- Go to https://huggingface.co/settings/tokens
- Click "New token"
- Give your token a name (e.g., "LlamaDownload")
- Select "Read" role (minimum required permission)
- Click "Generate token"
- Copy the token value - you'll need it for the download process
Step 4: Download the Model
For Meta-Llama-3.2-11B (requires authentication):
Run the model downloader script with your Hugging Face token:
python model_downloader.py "meta-llama/Meta-Llama-3.2-11B" "/path/to/save/directory" --token YOUR_HF_TOKEN
/path/to/save/directory
with your desired save locationYOUR_HF_TOKEN
with the token you copied in Step 3
For Public Models (no authentication required):
Run the model downloader script without the token argument:
python model_downloader.py "microsoft/Phi-4-mini-instruct" "/path/to/save/directory"
Expected Output
The script will:
- Log in to Hugging Face with your token
- Download the model (this is a large model, approximately 25GB, so it may take some time)
- Save the model to the specified directory
Example terminal output:
Logged in to Hugging Face with the provided token.
Downloading and saving language model: meta-llama/Meta-Llama-3.2-11B
Downloading model meta-llama/Meta-Llama-3.2-11B...
Saving model to /path/to/save/directory/meta-llama/Meta-Llama-3.2-11B/model
Saving tokenizer to /path/to/save/directory/meta-llama/Meta-Llama-3.2-11B/tokenizer
Model successfully downloaded and saved to /path/to/save/directory/meta-llama/Meta-Llama-3.2-11B
Troubleshooting
Authentication Issues
If you receive an authentication error:
- Verify that you've accepted the model license on the Hugging Face website
- Check that your token is correct and has not expired
- Ensure your account has been approved for access to Llama models
Download Issues
If the download fails:
- Check your internet connection
- Ensure you have enough disk space (at least 40GB recommended)
- Try running the script again, as it may resume downloading
Script Errors
If the script fails with errors:
- Make sure all required libraries are installed
- Check that you're using a compatible Python version
- Verify the script is using the correct import statements
Next Steps
After downloading, you can use the model with the RegScale AI Inference Server.
Hardware Requirements
Note that Meta-Llama-3.2-11B is a large model that requires significant computational resources:
- At least 22GB of VRAM for full precision (32-bit)
- At least 11GB of VRAM with half precision (16-bit)
- A form of quantization will be automatically applied for testing in more resource-constrained environments, but this is not recommended. See our Resource Configuration page for more details.
Updated 3 days ago