NET Vocal Remover API
Advanced AI-powered audio separation technology for splitting music tracks into vocals and instrumental components. Utilizes state-of-the-art ONNX neural network models with STFT-based processing and intelligent noise reduction.
Overview
The Vocal Remover system enables professional-grade audio source separation by leveraging advanced AI models:
- AI-Powered Separation: Uses ONNX neural network models for accurate vocal/instrumental separation
- Multiple Model Options: Choose between Default, Best, and Karaoke models for different use cases
- GPU Acceleration: Automatic CUDA support for faster processing when available
- Chunked Processing: Handles long audio files with configurable chunk sizes and overlap margins
- Noise Reduction: Optional advanced denoising for cleaner results
- Progress Tracking: Real-time progress updates with detailed status information
Key Features
AI Model Selection
Three pre-trained models optimized for different scenarios: Default for balanced quality and speed, Best for maximum quality, and Karaoke for preserving background vocals.
STFT Processing
Short-Time Fourier Transform based processing with configurable FFT size, Hanning window, and frequency/time dimensions for precise spectral analysis.
Smart Chunking
Intelligent audio segmentation with overlapping margins for seamless processing of files of any length without memory constraints.
Hardware Acceleration
Automatic detection and utilization of CUDA-enabled GPUs for significantly faster processing, with CPU fallback for compatibility.
SimpleAudioSeparationService Class
The SimpleAudioSeparationService class provides the core functionality for audio separation.
using OwnaudioNET.Features.Vocalremover;
// Create service with custom options
var options = new SimpleSeparationOptions
{
Model = InternalModel.Best,
OutputDirectory = "output",
ChunkSizeSeconds = 15,
DisableNoiseReduction = false
};
var service = new SimpleAudioSeparationService(options);
service.Initialize();
// Separate audio file
SimpleSeparationResult result = service.Separate("input.mp3");
Console.WriteLine($"Vocals: {result.VocalsPath}");
Console.WriteLine($"Instrumental: {result.InstrumentalPath}");
Console.WriteLine($"Processing time: {result.ProcessingTime}");
Public API Methods
| Method | Return | Description |
|---|---|---|
Initialize() |
void | Initializes the ONNX model session with GPU or CPU execution provider |
Separate(string) |
SimpleSeparationResult | Separates audio file into vocals and instrumental tracks |
Dispose() |
void | Releases all resources including ONNX session |
Events
| Event | Type | Description |
|---|---|---|
ProgressChanged |
EventHandler<SimpleSeparationProgress> | Raised when processing progress updates |
ProcessingCompleted |
EventHandler<SimpleSeparationResult> | Raised when separation completes successfully |
Data Classes
SimpleSeparationOptions
Configuration parameters for the audio separation process.
public class SimpleSeparationOptions
{
// ONNX model file path (optional if using internal models)
public string? ModelPath { get; set; }
// Internal model selection (Default, Best, Karaoke)
public InternalModel Model { get; set; } = InternalModel.Best;
// Output directory path
public string OutputDirectory { get; set; } = "separated";
// Disable noise reduction (enabled by default)
public bool DisableNoiseReduction { get; set; } = false;
// Margin size for overlapping chunks (in samples, default: 44100 = 1 second)
public int Margin { get; set; } = 44100;
// Chunk size in seconds (0 = process entire file at once)
public int ChunkSizeSeconds { get; set; } = 15;
// FFT size for STFT processing
public int NFft { get; set; } = 6144;
// Temporal dimension parameter (as power of 2)
public int DimT { get; set; } = 8;
// Frequency dimension parameter
public int DimF { get; set; } = 2048;
}
SimpleSeparationProgress
Progress information for the separation process.
public class SimpleSeparationProgress
{
// Current file being processed
public string CurrentFile { get; set; }
// Overall progress percentage (0-100)
public double OverallProgress { get; set; }
// Current processing step description
public string Status { get; set; }
// Number of chunks processed
public int ProcessedChunks { get; set; }
// Total number of chunks
public int TotalChunks { get; set; }
}
SimpleSeparationResult
Result of the audio separation operation.
public class SimpleSeparationResult
{
// Path to the vocals output file
public string VocalsPath { get; set; }
// Path to the instrumental output file
public string InstrumentalPath { get; set; }
// Processing duration
public TimeSpan ProcessingTime { get; set; }
}
Separation Models
Available Models
Three built-in models optimized for different use cases, each embedded in the library for seamless usage:
| Model | Quality | Speed | Best For |
|---|---|---|---|
Default |
Good | Fast | General purpose separation with balanced quality and processing time. Ideal for quick previews and batch processing. |
Best |
Excellent | Slower | Maximum quality separation for professional applications. Produces the cleanest vocal and instrumental tracks with minimal artifacts. |
Karaoke |
Specialized | Medium | Removes lead vocals while preserving background vocals. Perfect for creating karaoke tracks with choir or backing vocal presence. |
Model Selection
public enum InternalModel
{
None, // Use custom model file via ModelPath
Default, // Balanced quality and speed
Best, // Highest quality, longer processing time
Karaoke // Preserve background vocals, remove lead vocal
}
// Usage examples
var options1 = new SimpleSeparationOptions { Model = InternalModel.Default };
var options2 = new SimpleSeparationOptions { Model = InternalModel.Best };
var options3 = new SimpleSeparationOptions { Model = InternalModel.Karaoke };
// Custom model
var options4 = new SimpleSeparationOptions
{
Model = InternalModel.None,
ModelPath = @"C:\Models\custom_model.onnx"
};
Model Characteristics
- Default Model: The most balanced option, providing good quality separation with relatively fast processing times. Suitable for most use cases where a reasonable quality-speed tradeoff is acceptable.
- Best Model: Utilizes a larger and more complex neural network architecture, resulting in superior separation quality with cleaner vocals and instrumental tracks. Processing time is approximately 2-3x longer than the Default model.
- Karaoke Model: Specifically trained to remove only the lead vocal while preserving backing vocals, harmonies, and vocal effects. This creates a more authentic karaoke experience compared to complete vocal removal.
Advanced Configuration
Processing Parameters
Fine-tune the separation process with advanced STFT and chunking parameters:
| Parameter | Default | Description |
|---|---|---|
ChunkSizeSeconds |
15 | Length of each processing chunk in seconds. Smaller values use less memory but increase processing overhead. |
Margin |
44100 | Overlap margin in samples between chunks. Prevents artifacts at chunk boundaries. Should be at least 0.5 seconds. |
NFft |
6144 | FFT size for spectral analysis. Higher values provide better frequency resolution but increase computation time. |
DimF |
2048 | Frequency dimension for model input. Auto-detected from model metadata if not specified. |
DimT |
8 | Time dimension as power of 2 (2^8 = 256 time frames). Auto-detected from model metadata if not specified. |
DisableNoiseReduction |
false | When false, applies advanced noise reduction using phase inversion technique for cleaner results. |
Hardware Acceleration
The service automatically detects and utilizes available hardware acceleration:
// Automatic GPU detection during initialization
service.Initialize();
// Console output will show:
// "CUDA execution provider enabled." (if GPU available)
// OR
// "Using CPU execution provider." (fallback)
// Processing speed comparison:
// CPU: ~2-5x real-time (depends on CPU cores)
// GPU (CUDA): ~10-30x real-time (depends on GPU model)
Memory Management
For large files or systems with limited memory, adjust chunk size and margin:
// Low memory configuration (suitable for 4GB RAM)
var lowMemOptions = new SimpleSeparationOptions
{
Model = InternalModel.Default,
ChunkSizeSeconds = 10, // Smaller chunks
Margin = 22050 // 0.5 second margin
};
// High quality configuration (requires 8GB+ RAM)
var highQualityOptions = new SimpleSeparationOptions
{
Model = InternalModel.Best,
ChunkSizeSeconds = 30, // Larger chunks for better quality
Margin = 88200 // 2 second margin for smoother transitions
};
Progress Tracking
Monitoring Progress
Track separation progress in real-time using event handlers:
var service = new SimpleAudioSeparationService(options);
// Subscribe to progress events
service.ProgressChanged += (sender, progress) =>
{
Console.WriteLine($"[{progress.OverallProgress:F1}%] {progress.Status}");
if (progress.TotalChunks > 0)
{
Console.WriteLine($"Chunk {progress.ProcessedChunks}/{progress.TotalChunks}");
}
};
// Subscribe to completion event
service.ProcessingCompleted += (sender, result) =>
{
Console.WriteLine($"\nSeparation completed in {result.ProcessingTime}");
Console.WriteLine($"Vocals saved to: {result.VocalsPath}");
Console.WriteLine($"Instrumental saved to: {result.InstrumentalPath}");
};
service.Initialize();
service.Separate("song.mp3");
Progress Stages
- Loading audio file (0%): Decoding input audio to 44.1kHz stereo format
- Processing audio separation (10%): Creating and processing audio chunks
- Processing chunks (20-80%): Running neural network inference on each chunk
- Calculating results (90%): Reconstructing full-length separated tracks
- Completed (100%): Saving output files to disk
Factory Methods
AudioSeparationExtensions
Convenient factory methods for quick service creation:
using OwnaudioNET.Features.Vocalremover;
// Create service with internal model
var service1 = AudioSeparationExtensions.CreateDefaultService(InternalModel.Best);
// Create service with custom model file
var service2 = AudioSeparationExtensions.CreateDefaultService("path/to/model.onnx");
// Create service with custom output directory
var service3 = AudioSeparationExtensions.CreatetService(
InternalModel.Karaoke,
@"C:\Output\Karaoke"
);
// Create service with custom model and output
var service4 = AudioSeparationExtensions.CreatetService(
"custom_model.onnx",
@"C:\Output\Custom"
);
SimpleSeparator Factory
Simplified factory for quick one-line initialization:
// Quick initialization and separation
var (service, _, _) = SimpleSeparator.Separator(
InternalModel.Default,
"output_folder"
);
var result = service.Separate("input.wav");
service.Dispose();
Helper Methods
// Validate audio file format
bool isValid = AudioSeparationExtensions.IsValidAudioFile("song.mp3");
// Supports: .wav, .mp3, .flac
// Estimate processing time
TimeSpan estimate = AudioSeparationExtensions.EstimateProcessingTime("song.wav");
Console.WriteLine($"Estimated processing time: {estimate}");
Usage Examples
Basic Vocal Removal
using OwnaudioNET.Features.Vocalremover;
// Create service with default settings
var options = new SimpleSeparationOptions
{
Model = InternalModel.Default,
OutputDirectory = "separated"
};
using var service = new SimpleAudioSeparationService(options);
service.Initialize();
// Separate audio file
var result = service.Separate(@"C:\Music\song.mp3");
Console.WriteLine($"Vocals: {result.VocalsPath}");
Console.WriteLine($"Instrumental: {result.InstrumentalPath}");
Console.WriteLine($"Time: {result.ProcessingTime.TotalSeconds:F1}s");
High-Quality Separation with Progress
var options = new SimpleSeparationOptions
{
Model = InternalModel.Best,
OutputDirectory = "output_best",
ChunkSizeSeconds = 20,
DisableNoiseReduction = false // Enable noise reduction
};
using var service = new SimpleAudioSeparationService(options);
// Progress tracking
service.ProgressChanged += (s, p) =>
{
Console.Write($"\r[{p.OverallProgress:F1}%] {p.Status}");
if (p.TotalChunks > 0)
Console.Write($" - Chunk {p.ProcessedChunks}/{p.TotalChunks}");
};
service.ProcessingCompleted += (s, r) =>
{
Console.WriteLine($"\n\nCompleted in {r.ProcessingTime}");
Console.WriteLine($"Output files:\n {r.VocalsPath}\n {r.InstrumentalPath}");
};
service.Initialize();
service.Separate("input_song.flac");
Karaoke Track Creation
// Use Karaoke model to preserve background vocals
var options = new SimpleSeparationOptions
{
Model = InternalModel.Karaoke,
OutputDirectory = "karaoke_tracks"
};
using var service = new SimpleAudioSeparationService(options);
service.Initialize();
// Process multiple songs
string[] songs = Directory.GetFiles(@"C:\Music\Album", "*.mp3");
foreach (var song in songs)
{
Console.WriteLine($"\nProcessing: {Path.GetFileName(song)}");
var result = service.Separate(song);
Console.WriteLine($"Karaoke track created: {result.InstrumentalPath}");
}
Batch Processing with Factory Method
// Create service using factory
var service = AudioSeparationExtensions.CreatetService(
InternalModel.Default,
@"C:\Output\Vocals_Removed"
);
service.Initialize();
// Process all WAV files in directory
var files = Directory.GetFiles(@"C:\Music", "*.wav");
int count = 0;
foreach (var file in files)
{
try
{
Console.WriteLine($"\n[{++count}/{files.Length}] {Path.GetFileName(file)}");
var result = service.Separate(file);
Console.WriteLine($"✓ Completed in {result.ProcessingTime.TotalSeconds:F1}s");
}
catch (Exception ex)
{
Console.WriteLine($"✗ Error: {ex.Message}");
}
}
service.Dispose();
Custom Model Configuration
// Load custom trained model
var options = new SimpleSeparationOptions
{
Model = InternalModel.None,
ModelPath = @"C:\Models\my_custom_separator.onnx",
OutputDirectory = "custom_output",
// Adjust STFT parameters if needed for custom model
NFft = 4096,
DimF = 1024,
DimT = 9 // 2^9 = 512 time frames
};
using var service = new SimpleAudioSeparationService(options);
service.Initialize();
// Model parameters are auto-detected from ONNX metadata
var result = service.Separate("test.wav");
- Model choice: Best model is 2-3x slower than Default
- Hardware: GPU acceleration provides 5-15x speedup over CPU
- File length: Processing time scales linearly with audio duration
- Chunk size: Larger chunks are more efficient but use more memory
- STFT with Hanning window for accurate frequency-domain representation
- Reflection padding to prevent boundary artifacts
- Hermitian symmetry for proper inverse FFT reconstruction
- Overlap-add synthesis with automatic windowing compensation
- Phase inversion noise reduction for cleaner separation results
- Automatic normalization to prevent clipping in output files
- 44.1kHz stereo processing for optimal quality
- WAV (uncompressed PCM)
- MP3 (MPEG Audio Layer 3)
- FLAC (Free Lossless Audio Codec)