
If you’re interested in running Vox CPM, an open-source text-to-speech AI model, on your own machine, this step-by-step guide will walk you through the process. Running it locally allows you to generate high-quality text-to-speech outputs and experiment with cloning voice without relying on cloud services.
Prerequisites
Before getting started, ensure the following are installed on your system:
- Python
- Git
- UV (Universal Virtual environment manager)
These tools are essential to clone the repository and set up the project environment.
Step 1: Clone the Repository
Open your terminal or command prompt and run the following command to clone the Vox CPM repository:
git clone https://github.com/OpenBMB/VoxCPM.git
This command will download all the necessary files from GitHub to your local machine. Once cloned, you’re ready to start cloning voice and experimenting with the AI model.
Step 2: Navigate to the Project Folder
Change your working directory to the cloned repository:
cd VoxCPM
This ensures that all subsequent commands are executed in the correct project folder.
Step 3: Sync the Virtual Environment
Set up the virtual environment and install all required dependencies by running:
uv sync
This step may take a few minutes depending on your internet connection. It ensures that all libraries and packages required by Vox CPM are installed correctly.
Step 4: Run the Application
Finally, start the app with the following command:
uv run app.py
This will launch the Vox CPM app locally. By default, it will run on port 7860, which you can access via your web browser. Once the app is running, you can begin testing and generating cloning voice outputs.
Conclusion
Running Vox CPM locally gives you full control over your cloning voice projects without depending on cloud services. If you encounter any issues during installation, check that Python, Git, and UV are installed correctly, or consult the project’s GitHub repository for troubleshooting tips.
By following these four simple commands, you can have Vox CPM up and running in just a few minutes, ready to create high-quality AI-generated speech.
