Detailed guide for cloning any voice from a mp3/webm file locally on your PC
For educational purposes only.
Requirements:
1. 6GB VRAM (if you want to to run on GPU otherwise you can also run it on CPU but the voice cloning will take longer to complete)
2. Audio file for the voice you want to clone
Demo output: https://imgur.com/a/HdN5tG4
Steps:
1. Install python requirements
Install pytorch if not already installed: https://pytorch.org/get-started/locally/
Install zonos using pip:
2. Copy paste the following code:
Create a new file sample.py, paste the above code in it & replace "assets/exampleaudio.mp3" with the audio file path you want to clone.
I'll be using this audio file: Click here to view
3. Run the python file
NOTE: You might get the error "ModuleNotFoundError: No module named 'zonos' " even after installing it with pip. In this case, do the following:
- create a new folder named 'VoiceClone'
- create a new file clone.py & paste the above python code in it
- open a terminal in this folder
- run the following command
- go inside the Zonos folder and change "assets/exampleaudio.mp3" to the path for your audio file in sample.py file
- Run sample.py
4. The output will be saved as sample.wav
For educational purposes only.
Requirements:
1. 6GB VRAM (if you want to to run on GPU otherwise you can also run it on CPU but the voice cloning will take longer to complete)
2. Audio file for the voice you want to clone
Demo output: https://imgur.com/a/HdN5tG4
Steps:
1. Install python requirements
Install pytorch if not already installed: https://pytorch.org/get-started/locally/
Install zonos using pip:
Code:
pip install zonos
2. Copy paste the following code:
Code:
import torch
import torchaudio
from zonos.model import Zonos
from zonos.conditioning import make_cond_dict
# model = Zonos.from_pretrained("Zyphra/Zonos-v0.1-hybrid", device="cuda")
model = Zonos.from_pretrained("Zyphra/Zonos-v0.1-transformer", device="cuda")
wav, sampling_rate = torchaudio.load("assets/exampleaudio.mp3")
speaker = model.make_speaker_embedding(wav, sampling_rate)
cond_dict = make_cond_dict(text="Hello, world!", speaker=speaker, language="en-us")
conditioning = model.prepare_conditioning(cond_dict)
codes = model.generate(conditioning)
wavs = model.autoencoder.decode(codes).cpu()
torchaudio.save("sample.wav", wavs[0], model.autoencoder.sampling_rate)
Create a new file sample.py, paste the above code in it & replace "assets/exampleaudio.mp3" with the audio file path you want to clone.
I'll be using this audio file: Click here to view
3. Run the python file
Code:
python sample.py
NOTE: You might get the error "ModuleNotFoundError: No module named 'zonos' " even after installing it with pip. In this case, do the following:
- create a new folder named 'VoiceClone'
- create a new file clone.py & paste the above python code in it
- open a terminal in this folder
- run the following command
Code:
git clone https://github.com/Zyphra/Zonos
- Run sample.py
4. The output will be saved as sample.wav