NET Vocal Remover API

Advanced AI-powered audio separation technology for splitting music tracks into vocals and instrumental components. Utilizes state-of-the-art ONNX neural network models with STFT-based processing and intelligent noise reduction.

Professional Audio Separation The Vocal Remover API provides sophisticated AI-based source separation using deep learning models, with support for GPU acceleration, chunked processing, and multiple quality presets.

Overview

The Vocal Remover system enables professional-grade audio source separation by leveraging advanced AI models:

Key Features

AI Model Selection

Three pre-trained models optimized for different scenarios: Default for balanced quality and speed, Best for maximum quality, and Karaoke for preserving background vocals.

STFT Processing

Short-Time Fourier Transform based processing with configurable FFT size, Hanning window, and frequency/time dimensions for precise spectral analysis.

Smart Chunking

Intelligent audio segmentation with overlapping margins for seamless processing of files of any length without memory constraints.

Hardware Acceleration

Automatic detection and utilization of CUDA-enabled GPUs for significantly faster processing, with CPU fallback for compatibility.

SimpleAudioSeparationService Class

The SimpleAudioSeparationService class provides the core functionality for audio separation.

AudioSeparationService Namespace
using OwnaudioNET.Features.Vocalremover;

// Create service with custom options
var options = new SimpleSeparationOptions
{
    Model = InternalModel.Best,
    OutputDirectory = "output",
    ChunkSizeSeconds = 15,
    DisableNoiseReduction = false
};

var service = new SimpleAudioSeparationService(options);
service.Initialize();

// Separate audio file
SimpleSeparationResult result = service.Separate("input.mp3");

Console.WriteLine($"Vocals: {result.VocalsPath}");
Console.WriteLine($"Instrumental: {result.InstrumentalPath}");
Console.WriteLine($"Processing time: {result.ProcessingTime}");

Public API Methods

Method Return Description
Initialize() void Initializes the ONNX model session with GPU or CPU execution provider
Separate(string) SimpleSeparationResult Separates audio file into vocals and instrumental tracks
Dispose() void Releases all resources including ONNX session

Events

Event Type Description
ProgressChanged EventHandler<SimpleSeparationProgress> Raised when processing progress updates
ProcessingCompleted EventHandler<SimpleSeparationResult> Raised when separation completes successfully

Data Classes

SimpleSeparationOptions

Configuration parameters for the audio separation process.

SimpleSeparationOptions Class
public class SimpleSeparationOptions
{
    // ONNX model file path (optional if using internal models)
    public string? ModelPath { get; set; }

    // Internal model selection (Default, Best, Karaoke)
    public InternalModel Model { get; set; } = InternalModel.Best;

    // Output directory path
    public string OutputDirectory { get; set; } = "separated";

    // Disable noise reduction (enabled by default)
    public bool DisableNoiseReduction { get; set; } = false;

    // Margin size for overlapping chunks (in samples, default: 44100 = 1 second)
    public int Margin { get; set; } = 44100;

    // Chunk size in seconds (0 = process entire file at once)
    public int ChunkSizeSeconds { get; set; } = 15;

    // FFT size for STFT processing
    public int NFft { get; set; } = 6144;

    // Temporal dimension parameter (as power of 2)
    public int DimT { get; set; } = 8;

    // Frequency dimension parameter
    public int DimF { get; set; } = 2048;
}

SimpleSeparationProgress

Progress information for the separation process.

SimpleSeparationProgress Class
public class SimpleSeparationProgress
{
    // Current file being processed
    public string CurrentFile { get; set; }

    // Overall progress percentage (0-100)
    public double OverallProgress { get; set; }

    // Current processing step description
    public string Status { get; set; }

    // Number of chunks processed
    public int ProcessedChunks { get; set; }

    // Total number of chunks
    public int TotalChunks { get; set; }
}

SimpleSeparationResult

Result of the audio separation operation.

SimpleSeparationResult Class
public class SimpleSeparationResult
{
    // Path to the vocals output file
    public string VocalsPath { get; set; }

    // Path to the instrumental output file
    public string InstrumentalPath { get; set; }

    // Processing duration
    public TimeSpan ProcessingTime { get; set; }
}

Separation Models

Available Models

Three built-in models optimized for different use cases, each embedded in the library for seamless usage:

Model Quality Speed Best For
Default Good Fast General purpose separation with balanced quality and processing time. Ideal for quick previews and batch processing.
Best Excellent Slower Maximum quality separation for professional applications. Produces the cleanest vocal and instrumental tracks with minimal artifacts.
Karaoke Specialized Medium Removes lead vocals while preserving background vocals. Perfect for creating karaoke tracks with choir or backing vocal presence.

Model Selection

InternalModel Enum
public enum InternalModel
{
    None,      // Use custom model file via ModelPath
    Default,   // Balanced quality and speed
    Best,      // Highest quality, longer processing time
    Karaoke    // Preserve background vocals, remove lead vocal
}

// Usage examples
var options1 = new SimpleSeparationOptions { Model = InternalModel.Default };
var options2 = new SimpleSeparationOptions { Model = InternalModel.Best };
var options3 = new SimpleSeparationOptions { Model = InternalModel.Karaoke };

// Custom model
var options4 = new SimpleSeparationOptions
{
    Model = InternalModel.None,
    ModelPath = @"C:\Models\custom_model.onnx"
};

Model Characteristics

Advanced Configuration

Processing Parameters

Fine-tune the separation process with advanced STFT and chunking parameters:

Parameter Default Description
ChunkSizeSeconds 15 Length of each processing chunk in seconds. Smaller values use less memory but increase processing overhead.
Margin 44100 Overlap margin in samples between chunks. Prevents artifacts at chunk boundaries. Should be at least 0.5 seconds.
NFft 6144 FFT size for spectral analysis. Higher values provide better frequency resolution but increase computation time.
DimF 2048 Frequency dimension for model input. Auto-detected from model metadata if not specified.
DimT 8 Time dimension as power of 2 (2^8 = 256 time frames). Auto-detected from model metadata if not specified.
DisableNoiseReduction false When false, applies advanced noise reduction using phase inversion technique for cleaner results.

Hardware Acceleration

The service automatically detects and utilizes available hardware acceleration:

GPU Acceleration
// Automatic GPU detection during initialization
service.Initialize();

// Console output will show:
// "CUDA execution provider enabled." (if GPU available)
// OR
// "Using CPU execution provider." (fallback)

// Processing speed comparison:
// CPU: ~2-5x real-time (depends on CPU cores)
// GPU (CUDA): ~10-30x real-time (depends on GPU model)

Memory Management

For large files or systems with limited memory, adjust chunk size and margin:

Memory Optimization
// Low memory configuration (suitable for 4GB RAM)
var lowMemOptions = new SimpleSeparationOptions
{
    Model = InternalModel.Default,
    ChunkSizeSeconds = 10,  // Smaller chunks
    Margin = 22050         // 0.5 second margin
};

// High quality configuration (requires 8GB+ RAM)
var highQualityOptions = new SimpleSeparationOptions
{
    Model = InternalModel.Best,
    ChunkSizeSeconds = 30,  // Larger chunks for better quality
    Margin = 88200          // 2 second margin for smoother transitions
};

Progress Tracking

Monitoring Progress

Track separation progress in real-time using event handlers:

Progress Event Handling
var service = new SimpleAudioSeparationService(options);

// Subscribe to progress events
service.ProgressChanged += (sender, progress) =>
{
    Console.WriteLine($"[{progress.OverallProgress:F1}%] {progress.Status}");

    if (progress.TotalChunks > 0)
    {
        Console.WriteLine($"Chunk {progress.ProcessedChunks}/{progress.TotalChunks}");
    }
};

// Subscribe to completion event
service.ProcessingCompleted += (sender, result) =>
{
    Console.WriteLine($"\nSeparation completed in {result.ProcessingTime}");
    Console.WriteLine($"Vocals saved to: {result.VocalsPath}");
    Console.WriteLine($"Instrumental saved to: {result.InstrumentalPath}");
};

service.Initialize();
service.Separate("song.mp3");

Progress Stages

  1. Loading audio file (0%): Decoding input audio to 44.1kHz stereo format
  2. Processing audio separation (10%): Creating and processing audio chunks
  3. Processing chunks (20-80%): Running neural network inference on each chunk
  4. Calculating results (90%): Reconstructing full-length separated tracks
  5. Completed (100%): Saving output files to disk

Factory Methods

AudioSeparationExtensions

Convenient factory methods for quick service creation:

Factory Methods
using OwnaudioNET.Features.Vocalremover;

// Create service with internal model
var service1 = AudioSeparationExtensions.CreateDefaultService(InternalModel.Best);

// Create service with custom model file
var service2 = AudioSeparationExtensions.CreateDefaultService("path/to/model.onnx");

// Create service with custom output directory
var service3 = AudioSeparationExtensions.CreatetService(
    InternalModel.Karaoke,
    @"C:\Output\Karaoke"
);

// Create service with custom model and output
var service4 = AudioSeparationExtensions.CreatetService(
    "custom_model.onnx",
    @"C:\Output\Custom"
);

SimpleSeparator Factory

Simplified factory for quick one-line initialization:

SimpleSeparator Usage
// Quick initialization and separation
var (service, _, _) = SimpleSeparator.Separator(
    InternalModel.Default,
    "output_folder"
);

var result = service.Separate("input.wav");
service.Dispose();

Helper Methods

Audio Validation & Estimation
// Validate audio file format
bool isValid = AudioSeparationExtensions.IsValidAudioFile("song.mp3");
// Supports: .wav, .mp3, .flac

// Estimate processing time
TimeSpan estimate = AudioSeparationExtensions.EstimateProcessingTime("song.wav");
Console.WriteLine($"Estimated processing time: {estimate}");

Usage Examples

Basic Vocal Removal

Simple Vocal Removal
using OwnaudioNET.Features.Vocalremover;

// Create service with default settings
var options = new SimpleSeparationOptions
{
    Model = InternalModel.Default,
    OutputDirectory = "separated"
};

using var service = new SimpleAudioSeparationService(options);
service.Initialize();

// Separate audio file
var result = service.Separate(@"C:\Music\song.mp3");

Console.WriteLine($"Vocals: {result.VocalsPath}");
Console.WriteLine($"Instrumental: {result.InstrumentalPath}");
Console.WriteLine($"Time: {result.ProcessingTime.TotalSeconds:F1}s");

High-Quality Separation with Progress

Best Quality with Progress Tracking
var options = new SimpleSeparationOptions
{
    Model = InternalModel.Best,
    OutputDirectory = "output_best",
    ChunkSizeSeconds = 20,
    DisableNoiseReduction = false  // Enable noise reduction
};

using var service = new SimpleAudioSeparationService(options);

// Progress tracking
service.ProgressChanged += (s, p) =>
{
    Console.Write($"\r[{p.OverallProgress:F1}%] {p.Status}");
    if (p.TotalChunks > 0)
        Console.Write($" - Chunk {p.ProcessedChunks}/{p.TotalChunks}");
};

service.ProcessingCompleted += (s, r) =>
{
    Console.WriteLine($"\n\nCompleted in {r.ProcessingTime}");
    Console.WriteLine($"Output files:\n  {r.VocalsPath}\n  {r.InstrumentalPath}");
};

service.Initialize();
service.Separate("input_song.flac");

Karaoke Track Creation

Creating Karaoke Tracks
// Use Karaoke model to preserve background vocals
var options = new SimpleSeparationOptions
{
    Model = InternalModel.Karaoke,
    OutputDirectory = "karaoke_tracks"
};

using var service = new SimpleAudioSeparationService(options);
service.Initialize();

// Process multiple songs
string[] songs = Directory.GetFiles(@"C:\Music\Album", "*.mp3");

foreach (var song in songs)
{
    Console.WriteLine($"\nProcessing: {Path.GetFileName(song)}");
    var result = service.Separate(song);
    Console.WriteLine($"Karaoke track created: {result.InstrumentalPath}");
}

Batch Processing with Factory Method

Batch Vocal Removal
// Create service using factory
var service = AudioSeparationExtensions.CreatetService(
    InternalModel.Default,
    @"C:\Output\Vocals_Removed"
);

service.Initialize();

// Process all WAV files in directory
var files = Directory.GetFiles(@"C:\Music", "*.wav");
int count = 0;

foreach (var file in files)
{
    try
    {
        Console.WriteLine($"\n[{++count}/{files.Length}] {Path.GetFileName(file)}");
        var result = service.Separate(file);
        Console.WriteLine($"✓ Completed in {result.ProcessingTime.TotalSeconds:F1}s");
    }
    catch (Exception ex)
    {
        Console.WriteLine($"✗ Error: {ex.Message}");
    }
}

service.Dispose();

Custom Model Configuration

Using Custom ONNX Models
// Load custom trained model
var options = new SimpleSeparationOptions
{
    Model = InternalModel.None,
    ModelPath = @"C:\Models\my_custom_separator.onnx",
    OutputDirectory = "custom_output",

    // Adjust STFT parameters if needed for custom model
    NFft = 4096,
    DimF = 1024,
    DimT = 9  // 2^9 = 512 time frames
};

using var service = new SimpleAudioSeparationService(options);
service.Initialize();

// Model parameters are auto-detected from ONNX metadata
var result = service.Separate("test.wav");
Performance Note Audio separation is computationally intensive. Processing time varies significantly based on:
  • Model choice: Best model is 2-3x slower than Default
  • Hardware: GPU acceleration provides 5-15x speedup over CPU
  • File length: Processing time scales linearly with audio duration
  • Chunk size: Larger chunks are more efficient but use more memory
Typical processing time on modern hardware: 2-10x real-time on CPU, 0.2-1x real-time on GPU.
Technical Features The Vocal Remover API includes advanced audio processing techniques:
  • STFT with Hanning window for accurate frequency-domain representation
  • Reflection padding to prevent boundary artifacts
  • Hermitian symmetry for proper inverse FFT reconstruction
  • Overlap-add synthesis with automatic windowing compensation
  • Phase inversion noise reduction for cleaner separation results
  • Automatic normalization to prevent clipping in output files
  • 44.1kHz stereo processing for optimal quality
Supported Audio Formats The Vocal Remover supports common audio formats through the Ownaudio decoder:
  • WAV (uncompressed PCM)
  • MP3 (MPEG Audio Layer 3)
  • FLAC (Free Lossless Audio Codec)
Output files are always saved as 16-bit WAV at 44.1kHz stereo.

Related Documentation