NET Vocal Remover API

Advanced AI-powered audio separation technology for splitting music tracks into vocals and instrumental components. Utilizes state-of-the-art ONNX neural network models with STFT-based processing and intelligent noise reduction.

Professional Audio Separation The Vocal Remover API provides sophisticated AI-based source separation using deep learning models, with support for GPU acceleration, chunked processing, and multiple quality presets.

Overview

The Vocal Remover system enables professional-grade audio source separation by leveraging advanced AI models:

Key Features

AI Model Selection

Three pre-trained models optimized for different scenarios: Default for balanced quality and speed, Best for maximum quality, and Karaoke for preserving background vocals.

STFT Processing

Short-Time Fourier Transform based processing with configurable FFT size, Hanning window, and frequency/time dimensions for precise spectral analysis.

Smart Chunking

Intelligent audio segmentation with overlapping margins for seamless processing of files of any length without memory constraints.

Hardware Acceleration

Automatic detection and utilization of CUDA-enabled GPUs for significantly faster processing, with CPU fallback for compatibility.

SimpleAudioSeparationService Class

The SimpleAudioSeparationService class provides the core functionality for audio separation.

AudioSeparationService Namespace
using OwnaudioNET.Features.Vocalremover;

// Create service with custom options
var options = new SimpleSeparationOptions
{
    Model = InternalModel.Best,
    OutputDirectory = "output",
    ChunkSizeSeconds = 15,
    DisableNoiseReduction = false
};

var service = new SimpleAudioSeparationService(options);
service.Initialize();

// Separate audio file
SimpleSeparationResult result = service.Separate("input.mp3");

Console.WriteLine($"Vocals: {result.VocalsPath}");
Console.WriteLine($"Instrumental: {result.InstrumentalPath}");
Console.WriteLine($"Processing time: {result.ProcessingTime}");

Public API Methods

Method Return Description
Initialize() void Initializes the ONNX model session with GPU or CPU execution provider
Separate(string) SimpleSeparationResult Separates audio file into vocals and instrumental tracks
Dispose() void Releases all resources including ONNX session

Events

Event Type Description
ProgressChanged EventHandler<SimpleSeparationProgress> Raised when processing progress updates
ProcessingCompleted EventHandler<SimpleSeparationResult> Raised when separation completes successfully

Data Classes

SimpleSeparationOptions

Configuration parameters for the audio separation process.

SimpleSeparationOptions Class
public class SimpleSeparationOptions
{
    // ONNX model file path (optional if using internal models)
    public string? ModelPath { get; set; }

    // Internal model selection (Default, Best, Karaoke)
    public InternalModel Model { get; set; } = InternalModel.Best;

    // Output directory path
    public string OutputDirectory { get; set; } = "separated";

    // Disable noise reduction (enabled by default)
    public bool DisableNoiseReduction { get; set; } = false;

    // Margin size for overlapping chunks (in samples, default: 44100 = 1 second)
    public int Margin { get; set; } = 44100;

    // Chunk size in seconds (0 = process entire file at once)
    public int ChunkSizeSeconds { get; set; } = 15;

    // FFT size for STFT processing
    public int NFft { get; set; } = 6144;

    // Temporal dimension parameter (as power of 2)
    public int DimT { get; set; } = 8;

    // Frequency dimension parameter
    public int DimF { get; set; } = 2048;
}

SimpleSeparationProgress

Progress information for the separation process.

SimpleSeparationProgress Class
public class SimpleSeparationProgress
{
    // Current file being processed
    public string CurrentFile { get; set; }

    // Overall progress percentage (0-100)
    public double OverallProgress { get; set; }

    // Current processing step description
    public string Status { get; set; }

    // Number of chunks processed
    public int ProcessedChunks { get; set; }

    // Total number of chunks
    public int TotalChunks { get; set; }
}

SimpleSeparationResult

Result of the audio separation operation.

SimpleSeparationResult Class
public class SimpleSeparationResult
{
    // Path to the vocals output file
    public string VocalsPath { get; set; }

    // Path to the instrumental output file
    public string InstrumentalPath { get; set; }

    // Processing duration
    public TimeSpan ProcessingTime { get; set; }
}

Separation Models

Available Models

Three built-in models optimized for different use cases, each embedded in the library for seamless usage:

Model Quality Speed Best For
Default Good Fast General purpose separation with balanced quality and processing time. Ideal for quick previews and batch processing.
Best Excellent Slower Maximum quality separation for professional applications. Produces the cleanest vocal and instrumental tracks with minimal artifacts.
Karaoke Specialized Medium Removes lead vocals while preserving background vocals. Perfect for creating karaoke tracks with choir or backing vocal presence.

Model Selection

InternalModel Enum
public enum InternalModel
{
    None,      // Use custom model file via ModelPath
    Default,   // Balanced quality and speed
    Best,      // Highest quality, longer processing time
    Karaoke    // Preserve background vocals, remove lead vocal
}

// Usage examples
var options1 = new SimpleSeparationOptions { Model = InternalModel.Default };
var options2 = new SimpleSeparationOptions { Model = InternalModel.Best };
var options3 = new SimpleSeparationOptions { Model = InternalModel.Karaoke };

// Custom model
var options4 = new SimpleSeparationOptions
{
    Model = InternalModel.None,
    ModelPath = @"C:\Models\custom_model.onnx"
};

Model Characteristics

HTDemucs Model - Advanced Stem Separation

For more advanced audio separation needs, the HTDemucs (Hybrid Transformer Demucs) model provides state-of-the-art multi-stem separation capabilities. Unlike the basic vocal/instrumental separation, HTDemucs can separate audio into four distinct stems:

Stem Description Use Cases
Vocals Singing and speech Karaoke, vocal analysis, remixing
Drums Percussion instruments Rhythm analysis, drum replacement
Bass Bass guitar and low-frequency instruments Bass isolation, mixing, mastering
Other All other instruments (guitars, keyboards, strings, etc.) Instrumental remixing, analysis
Model Availability
  • NUGET Package: The HTDemucs model is embedded as a resource in the OwnaudioNET NUGET package. You don't need to download or handle the model file separately when using the package!
  • Source Code: Due to GitHub size limitations, the source code repository does NOT include the HTDemucs model file. If you clone the repository, you must download the model separately from HuggingFace.

HTDemucs Usage Example

Using HTDemucs for Multi-Stem Separation
using OwnaudioNET.Features.HTDemucs;

// Create HTDemucs separator with embedded model
var options = new HTDemucsSeparationOptions
{
    Model = InternalModel.HTDemucs,  // Use embedded resource
    OutputDirectory = "output_htdemucs",
    ChunkSizeSeconds = 10,           // Chunk size (10-30s recommended)
    OverlapFactor = 0.25f,           // Overlap between chunks (0.25 = 25%)
    EnableGPU = true,                // Use GPU acceleration
    TargetStems = HTDemucsStem.All   // Extract all stems
};

using var separator = new HTDemucsAudioSeparator(options);
separator.Initialize();

// Progress tracking
separator.ProgressChanged += (s, progress) =>
{
    Console.WriteLine($"{progress.Status}: {progress.OverallProgress:F1}%");
    Console.WriteLine($"Chunks: {progress.ProcessedChunks}/{progress.TotalChunks}");
};

// Separate audio into stems
var result = separator.Separate("music.mp3");

Console.WriteLine($"Vocals: {result.VocalsPath}");
Console.WriteLine($"Drums: {result.DrumsPath}");
Console.WriteLine($"Bass: {result.BassPath}");
Console.WriteLine($"Other: {result.OtherPath}");
Console.WriteLine($"Processing time: {result.ProcessingTime}");

Selective Stem Extraction

Extract Only Specific Stems
// Extract only vocals and other instruments
var options = new HTDemucsSeparationOptions
{
    Model = InternalModel.HTDemucs,
    OutputDirectory = "output",
    TargetStems = HTDemucsStem.Vocals | HTDemucsStem.Other
};

// Extract only drums
var drumsOnly = new HTDemucsSeparationOptions
{
    Model = InternalModel.HTDemucs,
    OutputDirectory = "drums_output",
    TargetStems = HTDemucsStem.Drums
};

// Using helper methods
using var separator = HTDemucsExtensions.CreateDefaultSeparator("output_directory");
separator.Initialize();
var result = separator.Separate("music.mp3");

HTDemucs Performance

HTDemucs provides superior separation quality but requires more computational resources:

Hardware Processing Speed Example (3 min song)
CPU (16 cores) 10-15x realtime ~12-18 seconds
GPU (NVIDIA RTX 3060) 50-100x realtime ~2-4 seconds
GPU (NVIDIA RTX 4090) 100-150x realtime ~1-2 seconds
HTDemucs Model Download for Source Code Users If you're building from source code (not using the NUGET package), you must download the HTDemucs model file from HuggingFace due to GitHub's file size restrictions. The NUGET package includes the model automatically.

Multi-Model Averaging

The MultiModelAudioSeparator class takes audio separation quality to the next level by running multiple ONNX models in parallel and averaging their outputs. Each model independently processes the original audio, and the resulting vocals and instrumentals are averaged across all models — yielding cleaner separation with fewer artifacts than any single model alone.

How It Works Every model receives the same original audio. After processing, the vocals from all models are averaged together and the instrumentals from all models are averaged together. Models can output either vocals or instrumentals; the complementary stem is always derived by subtracting the model output from the original mix.

Key Features

Parallel Averaging

All models process the original audio independently in sequence per chunk, then vocals and instrumentals are averaged — (V₁+V₂+…+Vₙ)/n and (I₁+I₂+…+Iₙ)/n.

Auto OutputType Detection

The system automatically detects whether a model outputs vocals or instrumentals by inspecting ONNX metadata, model name, or file path. Explicit configuration is also supported.

Intermediate Results

Optionally save every model's individual output to disk for debugging, comparison, or further post-processing.

Mixed Model Types

Combine vocal-focused and instrumental-focused models in the same pipeline — each contributes its strengths, and averaging smooths out the differences.

MultiModelAudioSeparator Class

Namespace
using OwnaudioNET.Features.Vocalremover;

Public API Methods

Method Return Description
Initialize() void Loads and initializes all ONNX model sessions, auto-detects dimensions and output types
Separate(string) MultiModelSeparationResult Processes the audio file through all models and returns the averaged separation result
Dispose() void Releases all ONNX sessions and managed resources

Events

Event Type Description
ProgressChanged EventHandler<MultiModelSeparationProgress> Raised on every chunk, with per-model index, name, chunk count, and overall percentage
ProcessingCompleted EventHandler<MultiModelSeparationResult> Raised when all models have finished and the averaged result is ready

Data Classes

MultiModelSeparationOptions

MultiModelSeparationOptions Class
public class MultiModelSeparationOptions
{
    // List of models to include in the averaging pipeline
    public List<MultiModelInfo> Models { get; set; } = new();

    // Output directory for final and intermediate files
    public string OutputDirectory { get; set; } = "separated_multimodel";

    // Enable GPU acceleration (CUDA on Windows/Linux, CoreML on macOS)
    public bool EnableGPU { get; set; } = true;

    // Overlap margin in samples between chunks (default: 44100 = 1 s)
    public int Margin { get; set; } = 44100;

    // Chunk size in seconds (0 = process entire file at once)
    public int ChunkSizeSeconds { get; set; } = 15;

    // Save individual model outputs to disk (useful for debugging)
    public bool SaveAllIntermediateResults { get; set; } = false;
}

MultiModelInfo

MultiModelInfo Class
public class MultiModelInfo
{
    // Human-readable name shown in progress events and output filenames
    public string Name { get; set; } = "Model";

    // ONNX model file path (leave null/empty to use an embedded InternalModel)
    public string? ModelPath { get; set; }

    // Embedded model selection (Default, Best, Karaoke, …)
    public InternalModel Model { get; set; } = InternalModel.None;

    // FFT size for STFT (0 = auto-detect from ONNX metadata)
    public int NFft { get; set; } = 6144;

    // Time dimension as power of 2 (2^DimT frames)
    public int DimT { get; set; } = 8;

    // Frequency dimension
    public int DimF { get; set; } = 2048;

    // Disable noise reduction for this specific model
    public bool DisableNoiseReduction { get; set; } = false;

    // Save this model's individual output (independent of SaveAllIntermediateResults)
    public bool SaveIntermediateOutput { get; set; } = false;

    // Explicit output type — leave null for auto-detection
    public ModelOutputType? OutputType { get; set; } = null;
}

ModelOutputType Enum

ModelOutputType Enum
public enum ModelOutputType
{
    // Model directly outputs the instrumental track.
    // Vocals = Original − Instrumental
    Instrumental,

    // Model directly outputs the vocal track.
    // Instrumental = Original − Vocals
    Vocals
}

MultiModelSeparationProgress

MultiModelSeparationProgress Class
public class MultiModelSeparationProgress
{
    public string CurrentFile { get; set; }        // File being processed
    public double OverallProgress { get; set; }    // 0–100 %
    public string Status { get; set; }             // Human-readable step description
    public int CurrentModelIndex { get; set; }     // 1-based model index
    public int TotalModels { get; set; }           // Total number of models
    public string CurrentModelName { get; set; }   // Name of the active model
    public int ProcessedChunks { get; set; }       // Chunks done for current model
    public int TotalChunks { get; set; }           // Total chunks for current model
}

MultiModelSeparationResult

MultiModelSeparationResult Class
public class MultiModelSeparationResult
{
    public string OutputPath { get; set; }         // Same as InstrumentalPath
    public string VocalsPath { get; set; }         // Averaged vocals WAV file
    public string InstrumentalPath { get; set; }   // Averaged instrumental WAV file
    public Dictionary<string, string> IntermediatePaths { get; set; }  // Per-model outputs
    public TimeSpan ProcessingTime { get; set; }   // Wall-clock processing duration
    public int ModelsProcessed { get; set; }       // Number of models that contributed
}

Usage Examples

Simple 2-Model Averaging

Using MultiModelExtensions Helper
using OwnaudioNET.Features.Vocalremover;

// Convenience factory: creates a 2-model averaging pipeline
using var separator = MultiModelExtensions.CreateSimplePipeline(
    model1: InternalModel.Best,
    model2: InternalModel.Karaoke,
    outputDirectory: "output"
);

separator.ProgressChanged += (s, p) =>
    Console.WriteLine($"[Model {p.CurrentModelIndex}/{p.TotalModels}] {p.OverallProgress:F1}%");

separator.Initialize();
var result = separator.Separate("song.mp3");

Console.WriteLine($"Vocals:       {result.VocalsPath}");
Console.WriteLine($"Instrumental: {result.InstrumentalPath}");
Console.WriteLine($"Time:         {result.ProcessingTime}");

Triple-Model Averaging

Using MultiModelExtensions.CreateTriplePipeline
using var separator = MultiModelExtensions.CreateTriplePipeline(
    model1: InternalModel.Best,
    model2: InternalModel.Default,
    model3: InternalModel.Karaoke,
    outputDirectory: "output_triple"
);

separator.Initialize();
var result = separator.Separate("song.flac");

// Final averaged outputs
Console.WriteLine($"Vocals:       {result.VocalsPath}");
Console.WriteLine($"Instrumental: {result.InstrumentalPath}");

// Intermediate per-model outputs (saved because the helper enables it)
foreach (var kv in result.IntermediatePaths)
    Console.WriteLine($"  {kv.Key}: {kv.Value}");

Full Control with Custom Options

Manual MultiModelSeparationOptions
var options = new MultiModelSeparationOptions
{
    Models = new List<MultiModelInfo>
    {
        new MultiModelInfo
        {
            Name = "Step1_BestQuality",
            Model = InternalModel.Best,
            NFft = 6144, DimT = 8, DimF = 2048,
            DisableNoiseReduction = false,
            SaveIntermediateOutput = true
        },
        new MultiModelInfo
        {
            Name = "Step2_Karaoke",
            Model = InternalModel.Karaoke,
            NFft = 6144, DimT = 8, DimF = 2048,
            DisableNoiseReduction = true
        }
    },
    OutputDirectory = "output_custom",
    EnableGPU = true,
    ChunkSizeSeconds = 15,
    Margin = 44100,
    SaveAllIntermediateResults = true
};

using var separator = new MultiModelAudioSeparator(options);

separator.ProgressChanged += (s, p) =>
{
    Console.Write($"\r[{p.CurrentModelName}] chunk {p.ProcessedChunks}/{p.TotalChunks} ({p.OverallProgress:F1}%)");
};

separator.Initialize();
var result = separator.Separate("input.wav");

Mixed OutputType — Vocal + Instrumental Models

Combining Models with Different Output Stems
var options = new MultiModelSeparationOptions
{
    Models = new List<MultiModelInfo>
    {
        new MultiModelInfo
        {
            Name = "VocalModel",
            ModelPath = @"models/vocal_model.onnx",
            OutputType = ModelOutputType.Vocals       // outputs vocal track directly
        },
        new MultiModelInfo
        {
            Name = "InstrumentalModel",
            ModelPath = @"models/instrumental_model.onnx",
            OutputType = ModelOutputType.Instrumental // outputs instrumental track directly
        }
    },
    OutputDirectory = "output_mixed",
    EnableGPU = true
};

// The averaging pipeline:
// - Vocal complement from model 2: original − instrumental
// - Instrumental complement from model 1: original − vocals
// - Final vocals:       (V₁ + V₂) / 2
// - Final instrumental: (I₁ + I₂) / 2

using var separator = new MultiModelAudioSeparator(options);
separator.Initialize();
var result = separator.Separate("song.mp3");

Custom ONNX Model Files with Auto-Detection

External ONNX Models — OutputType Auto-Detected
// OutputType is inferred from model filename:
//   "Voc_FT.onnx"   → contains "Voc"  → Vocals
//   "Inst_HQ.onnx"  → contains "Inst" → Instrumental
var options = new MultiModelSeparationOptions
{
    Models = new List<MultiModelInfo>
    {
        new MultiModelInfo { Name = "Voc_FT",  ModelPath = @"models/Voc_FT.onnx" },
        new MultiModelInfo { Name = "Inst_HQ", ModelPath = @"models/Inst_HQ.onnx" }
    },
    OutputDirectory = "output_custom_files",
    EnableGPU = true
};

using var separator = new MultiModelAudioSeparator(options);
separator.Initialize();
var result = separator.Separate("song.wav");

Auto-Detection of OutputType

When MultiModelInfo.OutputType is left as null, the system uses a three-strategy detection approach:

  1. ONNX output metadata: Inspects the model's output node names for keywords such as vocal, voice, instrumental, karaoke, etc.
  2. Model name / file path: Scans MultiModelInfo.Name and ModelPath for the same keywords.
  3. InternalModel enum name: Falls back to the enum value name.

If none of the strategies yields a conclusive result, Instrumental is used as the safe default.

Advanced Configuration

Processing Parameters

Fine-tune the separation process with advanced STFT and chunking parameters:

Parameter Default Description
ChunkSizeSeconds 15 Length of each processing chunk in seconds. Smaller values use less memory but increase processing overhead.
Margin 44100 Overlap margin in samples between chunks. Prevents artifacts at chunk boundaries. Should be at least 0.5 seconds.
NFft 6144 FFT size for spectral analysis. Higher values provide better frequency resolution but increase computation time.
DimF 2048 Frequency dimension for model input. Auto-detected from model metadata if not specified.
DimT 8 Time dimension as power of 2 (2^8 = 256 time frames). Auto-detected from model metadata if not specified.
DisableNoiseReduction false When false, applies advanced noise reduction using phase inversion technique for cleaner results.

Hardware Acceleration

The service automatically detects and utilizes available hardware acceleration:

GPU Acceleration
// Automatic GPU detection during initialization
service.Initialize();

// Console output will show:
// "CUDA execution provider enabled." (if GPU available)
// OR
// "Using CPU execution provider." (fallback)

// Processing speed comparison:
// CPU: ~2-5x real-time (depends on CPU cores)
// GPU (CUDA): ~10-30x real-time (depends on GPU model)

Memory Management

For large files or systems with limited memory, adjust chunk size and margin:

Memory Optimization
// Low memory configuration (suitable for 4GB RAM)
var lowMemOptions = new SimpleSeparationOptions
{
    Model = InternalModel.Default,
    ChunkSizeSeconds = 10,  // Smaller chunks
    Margin = 22050         // 0.5 second margin
};

// High quality configuration (requires 8GB+ RAM)
var highQualityOptions = new SimpleSeparationOptions
{
    Model = InternalModel.Best,
    ChunkSizeSeconds = 30,  // Larger chunks for better quality
    Margin = 88200          // 2 second margin for smoother transitions
};

Progress Tracking

Monitoring Progress

Track separation progress in real-time using event handlers:

Progress Event Handling
var service = new SimpleAudioSeparationService(options);

// Subscribe to progress events
service.ProgressChanged += (sender, progress) =>
{
    Console.WriteLine($"[{progress.OverallProgress:F1}%] {progress.Status}");

    if (progress.TotalChunks > 0)
    {
        Console.WriteLine($"Chunk {progress.ProcessedChunks}/{progress.TotalChunks}");
    }
};

// Subscribe to completion event
service.ProcessingCompleted += (sender, result) =>
{
    Console.WriteLine($"\nSeparation completed in {result.ProcessingTime}");
    Console.WriteLine($"Vocals saved to: {result.VocalsPath}");
    Console.WriteLine($"Instrumental saved to: {result.InstrumentalPath}");
};

service.Initialize();
service.Separate("song.mp3");

Progress Stages

  1. Loading audio file (0%): Decoding input audio to 44.1kHz stereo format
  2. Processing audio separation (10%): Creating and processing audio chunks
  3. Processing chunks (20-80%): Running neural network inference on each chunk
  4. Calculating results (90%): Reconstructing full-length separated tracks
  5. Completed (100%): Saving output files to disk

Factory Methods

AudioSeparationExtensions

Convenient factory methods for quick service creation:

Factory Methods
using OwnaudioNET.Features.Vocalremover;

// Create service with internal model
var service1 = AudioSeparationExtensions.CreateDefaultService(InternalModel.Best);

// Create service with custom model file
var service2 = AudioSeparationExtensions.CreateDefaultService("path/to/model.onnx");

// Create service with custom output directory
var service3 = AudioSeparationExtensions.CreatetService(
    InternalModel.Karaoke,
    @"C:\Output\Karaoke"
);

// Create service with custom model and output
var service4 = AudioSeparationExtensions.CreatetService(
    "custom_model.onnx",
    @"C:\Output\Custom"
);

SimpleSeparator Factory

Simplified factory for quick one-line initialization:

SimpleSeparator Usage
// Quick initialization and separation
var (service, _, _) = SimpleSeparator.Separator(
    InternalModel.Default,
    "output_folder"
);

var result = service.Separate("input.wav");
service.Dispose();

Helper Methods

Audio Validation & Estimation
// Validate audio file format
bool isValid = AudioSeparationExtensions.IsValidAudioFile("song.mp3");
// Supports: .wav, .mp3, .flac

// Estimate processing time
TimeSpan estimate = AudioSeparationExtensions.EstimateProcessingTime("song.wav");
Console.WriteLine($"Estimated processing time: {estimate}");

Usage Examples

Basic Vocal Removal

Simple Vocal Removal
using OwnaudioNET.Features.Vocalremover;

// Create service with default settings
var options = new SimpleSeparationOptions
{
    Model = InternalModel.Default,
    OutputDirectory = "separated"
};

using var service = new SimpleAudioSeparationService(options);
service.Initialize();

// Separate audio file
var result = service.Separate(@"C:\Music\song.mp3");

Console.WriteLine($"Vocals: {result.VocalsPath}");
Console.WriteLine($"Instrumental: {result.InstrumentalPath}");
Console.WriteLine($"Time: {result.ProcessingTime.TotalSeconds:F1}s");

High-Quality Separation with Progress

Best Quality with Progress Tracking
var options = new SimpleSeparationOptions
{
    Model = InternalModel.Best,
    OutputDirectory = "output_best",
    ChunkSizeSeconds = 20,
    DisableNoiseReduction = false  // Enable noise reduction
};

using var service = new SimpleAudioSeparationService(options);

// Progress tracking
service.ProgressChanged += (s, p) =>
{
    Console.Write($"\r[{p.OverallProgress:F1}%] {p.Status}");
    if (p.TotalChunks > 0)
        Console.Write($" - Chunk {p.ProcessedChunks}/{p.TotalChunks}");
};

service.ProcessingCompleted += (s, r) =>
{
    Console.WriteLine($"\n\nCompleted in {r.ProcessingTime}");
    Console.WriteLine($"Output files:\n  {r.VocalsPath}\n  {r.InstrumentalPath}");
};

service.Initialize();
service.Separate("input_song.flac");

Karaoke Track Creation

Creating Karaoke Tracks
// Use Karaoke model to preserve background vocals
var options = new SimpleSeparationOptions
{
    Model = InternalModel.Karaoke,
    OutputDirectory = "karaoke_tracks"
};

using var service = new SimpleAudioSeparationService(options);
service.Initialize();

// Process multiple songs
string[] songs = Directory.GetFiles(@"C:\Music\Album", "*.mp3");

foreach (var song in songs)
{
    Console.WriteLine($"\nProcessing: {Path.GetFileName(song)}");
    var result = service.Separate(song);
    Console.WriteLine($"Karaoke track created: {result.InstrumentalPath}");
}

Batch Processing with Factory Method

Batch Vocal Removal
// Create service using factory
var service = AudioSeparationExtensions.CreatetService(
    InternalModel.Default,
    @"C:\Output\Vocals_Removed"
);

service.Initialize();

// Process all WAV files in directory
var files = Directory.GetFiles(@"C:\Music", "*.wav");
int count = 0;

foreach (var file in files)
{
    try
    {
        Console.WriteLine($"\n[{++count}/{files.Length}] {Path.GetFileName(file)}");
        var result = service.Separate(file);
        Console.WriteLine($"✓ Completed in {result.ProcessingTime.TotalSeconds:F1}s");
    }
    catch (Exception ex)
    {
        Console.WriteLine($"✗ Error: {ex.Message}");
    }
}

service.Dispose();

Custom Model Configuration

Using Custom ONNX Models
// Load custom trained model
var options = new SimpleSeparationOptions
{
    Model = InternalModel.None,
    ModelPath = @"C:\Models\my_custom_separator.onnx",
    OutputDirectory = "custom_output",

    // Adjust STFT parameters if needed for custom model
    NFft = 4096,
    DimF = 1024,
    DimT = 9  // 2^9 = 512 time frames
};

using var service = new SimpleAudioSeparationService(options);
service.Initialize();

// Model parameters are auto-detected from ONNX metadata
var result = service.Separate("test.wav");
Performance Note Audio separation is computationally intensive. Processing time varies significantly based on:
  • Model choice: Best model is 2-3x slower than Default
  • Hardware: GPU acceleration provides 5-15x speedup over CPU
  • File length: Processing time scales linearly with audio duration
  • Chunk size: Larger chunks are more efficient but use more memory
Typical processing time on modern hardware: 2-10x real-time on CPU, 0.2-1x real-time on GPU.
Technical Features The Vocal Remover API includes advanced audio processing techniques:
  • STFT with Hanning window for accurate frequency-domain representation
  • Reflection padding to prevent boundary artifacts
  • Hermitian symmetry for proper inverse FFT reconstruction
  • Overlap-add synthesis with automatic windowing compensation
  • Phase inversion noise reduction for cleaner separation results
  • Automatic normalization to prevent clipping in output files
  • 44.1kHz stereo processing for optimal quality
Supported Audio Formats The Vocal Remover supports common audio formats through the Ownaudio decoder:
  • WAV (uncompressed PCM)
  • MP3 (MPEG Audio Layer 3)
  • FLAC (Free Lossless Audio Codec)
Output files are always saved as 16-bit WAV at 44.1kHz stereo.

Related Documentation