NET Vocal Remover API
Advanced AI-powered audio separation technology for splitting music tracks into vocals and instrumental components. Utilizes state-of-the-art ONNX neural network models with STFT-based processing and intelligent noise reduction.
Overview
The Vocal Remover system enables professional-grade audio source separation by leveraging advanced AI models:
- AI-Powered Separation: Uses ONNX neural network models for accurate vocal/instrumental separation
- Multiple Model Options: Choose between Default, Best, and Karaoke models for different use cases
- GPU Acceleration: Automatic CUDA support for faster processing when available
- Chunked Processing: Handles long audio files with configurable chunk sizes and overlap margins
- Noise Reduction: Optional advanced denoising for cleaner results
- Progress Tracking: Real-time progress updates with detailed status information
Key Features
AI Model Selection
Three pre-trained models optimized for different scenarios: Default for balanced quality and speed, Best for maximum quality, and Karaoke for preserving background vocals.
STFT Processing
Short-Time Fourier Transform based processing with configurable FFT size, Hanning window, and frequency/time dimensions for precise spectral analysis.
Smart Chunking
Intelligent audio segmentation with overlapping margins for seamless processing of files of any length without memory constraints.
Hardware Acceleration
Automatic detection and utilization of CUDA-enabled GPUs for significantly faster processing, with CPU fallback for compatibility.
SimpleAudioSeparationService Class
The SimpleAudioSeparationService class provides the core functionality for audio
separation.
using OwnaudioNET.Features.Vocalremover;
// Create service with custom options
var options = new SimpleSeparationOptions
{
Model = InternalModel.Best,
OutputDirectory = "output",
ChunkSizeSeconds = 15,
DisableNoiseReduction = false
};
var service = new SimpleAudioSeparationService(options);
service.Initialize();
// Separate audio file
SimpleSeparationResult result = service.Separate("input.mp3");
Console.WriteLine($"Vocals: {result.VocalsPath}");
Console.WriteLine($"Instrumental: {result.InstrumentalPath}");
Console.WriteLine($"Processing time: {result.ProcessingTime}");
Public API Methods
| Method | Return | Description |
|---|---|---|
Initialize() |
void | Initializes the ONNX model session with GPU or CPU execution provider |
Separate(string) |
SimpleSeparationResult | Separates audio file into vocals and instrumental tracks |
Dispose() |
void | Releases all resources including ONNX session |
Events
| Event | Type | Description |
|---|---|---|
ProgressChanged |
EventHandler<SimpleSeparationProgress> | Raised when processing progress updates |
ProcessingCompleted |
EventHandler<SimpleSeparationResult> | Raised when separation completes successfully |
Data Classes
SimpleSeparationOptions
Configuration parameters for the audio separation process.
public class SimpleSeparationOptions
{
// ONNX model file path (optional if using internal models)
public string? ModelPath { get; set; }
// Internal model selection (Default, Best, Karaoke)
public InternalModel Model { get; set; } = InternalModel.Best;
// Output directory path
public string OutputDirectory { get; set; } = "separated";
// Disable noise reduction (enabled by default)
public bool DisableNoiseReduction { get; set; } = false;
// Margin size for overlapping chunks (in samples, default: 44100 = 1 second)
public int Margin { get; set; } = 44100;
// Chunk size in seconds (0 = process entire file at once)
public int ChunkSizeSeconds { get; set; } = 15;
// FFT size for STFT processing
public int NFft { get; set; } = 6144;
// Temporal dimension parameter (as power of 2)
public int DimT { get; set; } = 8;
// Frequency dimension parameter
public int DimF { get; set; } = 2048;
}
SimpleSeparationProgress
Progress information for the separation process.
public class SimpleSeparationProgress
{
// Current file being processed
public string CurrentFile { get; set; }
// Overall progress percentage (0-100)
public double OverallProgress { get; set; }
// Current processing step description
public string Status { get; set; }
// Number of chunks processed
public int ProcessedChunks { get; set; }
// Total number of chunks
public int TotalChunks { get; set; }
}
SimpleSeparationResult
Result of the audio separation operation.
public class SimpleSeparationResult
{
// Path to the vocals output file
public string VocalsPath { get; set; }
// Path to the instrumental output file
public string InstrumentalPath { get; set; }
// Processing duration
public TimeSpan ProcessingTime { get; set; }
}
Separation Models
Available Models
Three built-in models optimized for different use cases, each embedded in the library for seamless usage:
| Model | Quality | Speed | Best For |
|---|---|---|---|
Default |
Good | Fast | General purpose separation with balanced quality and processing time. Ideal for quick previews and batch processing. |
Best |
Excellent | Slower | Maximum quality separation for professional applications. Produces the cleanest vocal and instrumental tracks with minimal artifacts. |
Karaoke |
Specialized | Medium | Removes lead vocals while preserving background vocals. Perfect for creating karaoke tracks with choir or backing vocal presence. |
Model Selection
public enum InternalModel
{
None, // Use custom model file via ModelPath
Default, // Balanced quality and speed
Best, // Highest quality, longer processing time
Karaoke // Preserve background vocals, remove lead vocal
}
// Usage examples
var options1 = new SimpleSeparationOptions { Model = InternalModel.Default };
var options2 = new SimpleSeparationOptions { Model = InternalModel.Best };
var options3 = new SimpleSeparationOptions { Model = InternalModel.Karaoke };
// Custom model
var options4 = new SimpleSeparationOptions
{
Model = InternalModel.None,
ModelPath = @"C:\Models\custom_model.onnx"
};
Model Characteristics
- Default Model: The most balanced option, providing good quality separation with relatively fast processing times. Suitable for most use cases where a reasonable quality-speed tradeoff is acceptable.
- Best Model: Utilizes a larger and more complex neural network architecture, resulting in superior separation quality with cleaner vocals and instrumental tracks. Processing time is approximately 2-3x longer than the Default model.
- Karaoke Model: Specifically trained to remove only the lead vocal while preserving backing vocals, harmonies, and vocal effects. This creates a more authentic karaoke experience compared to complete vocal removal.
HTDemucs Model - Advanced Stem Separation
For more advanced audio separation needs, the HTDemucs (Hybrid Transformer Demucs) model provides state-of-the-art multi-stem separation capabilities. Unlike the basic vocal/instrumental separation, HTDemucs can separate audio into four distinct stems:
| Stem | Description | Use Cases |
|---|---|---|
Vocals |
Singing and speech | Karaoke, vocal analysis, remixing |
Drums |
Percussion instruments | Rhythm analysis, drum replacement |
Bass |
Bass guitar and low-frequency instruments | Bass isolation, mixing, mastering |
Other |
All other instruments (guitars, keyboards, strings, etc.) | Instrumental remixing, analysis |
- NUGET Package: The HTDemucs model is embedded as a resource in the OwnaudioNET NUGET package. You don't need to download or handle the model file separately when using the package!
- Source Code: Due to GitHub size limitations, the source code repository does NOT include the HTDemucs model file. If you clone the repository, you must download the model separately from HuggingFace.
HTDemucs Usage Example
using OwnaudioNET.Features.HTDemucs;
// Create HTDemucs separator with embedded model
var options = new HTDemucsSeparationOptions
{
Model = InternalModel.HTDemucs, // Use embedded resource
OutputDirectory = "output_htdemucs",
ChunkSizeSeconds = 10, // Chunk size (10-30s recommended)
OverlapFactor = 0.25f, // Overlap between chunks (0.25 = 25%)
EnableGPU = true, // Use GPU acceleration
TargetStems = HTDemucsStem.All // Extract all stems
};
using var separator = new HTDemucsAudioSeparator(options);
separator.Initialize();
// Progress tracking
separator.ProgressChanged += (s, progress) =>
{
Console.WriteLine($"{progress.Status}: {progress.OverallProgress:F1}%");
Console.WriteLine($"Chunks: {progress.ProcessedChunks}/{progress.TotalChunks}");
};
// Separate audio into stems
var result = separator.Separate("music.mp3");
Console.WriteLine($"Vocals: {result.VocalsPath}");
Console.WriteLine($"Drums: {result.DrumsPath}");
Console.WriteLine($"Bass: {result.BassPath}");
Console.WriteLine($"Other: {result.OtherPath}");
Console.WriteLine($"Processing time: {result.ProcessingTime}");
Selective Stem Extraction
// Extract only vocals and other instruments
var options = new HTDemucsSeparationOptions
{
Model = InternalModel.HTDemucs,
OutputDirectory = "output",
TargetStems = HTDemucsStem.Vocals | HTDemucsStem.Other
};
// Extract only drums
var drumsOnly = new HTDemucsSeparationOptions
{
Model = InternalModel.HTDemucs,
OutputDirectory = "drums_output",
TargetStems = HTDemucsStem.Drums
};
// Using helper methods
using var separator = HTDemucsExtensions.CreateDefaultSeparator("output_directory");
separator.Initialize();
var result = separator.Separate("music.mp3");
HTDemucs Performance
HTDemucs provides superior separation quality but requires more computational resources:
| Hardware | Processing Speed | Example (3 min song) |
|---|---|---|
| CPU (16 cores) | 10-15x realtime | ~12-18 seconds |
| GPU (NVIDIA RTX 3060) | 50-100x realtime | ~2-4 seconds |
| GPU (NVIDIA RTX 4090) | 100-150x realtime | ~1-2 seconds |
Multi-Model Averaging
The MultiModelAudioSeparator class takes audio separation quality to the next level by
running multiple ONNX models in parallel and averaging their outputs. Each model independently processes
the original audio, and the resulting vocals and instrumentals are averaged across all models —
yielding cleaner separation with fewer artifacts than any single model alone.
Key Features
Parallel Averaging
All models process the original audio independently in sequence per chunk, then vocals and instrumentals are averaged — (V₁+V₂+…+Vₙ)/n and (I₁+I₂+…+Iₙ)/n.
Auto OutputType Detection
The system automatically detects whether a model outputs vocals or instrumentals by inspecting ONNX metadata, model name, or file path. Explicit configuration is also supported.
Intermediate Results
Optionally save every model's individual output to disk for debugging, comparison, or further post-processing.
Mixed Model Types
Combine vocal-focused and instrumental-focused models in the same pipeline — each contributes its strengths, and averaging smooths out the differences.
MultiModelAudioSeparator Class
using OwnaudioNET.Features.Vocalremover;
Public API Methods
| Method | Return | Description |
|---|---|---|
Initialize() |
void | Loads and initializes all ONNX model sessions, auto-detects dimensions and output types |
Separate(string) |
MultiModelSeparationResult | Processes the audio file through all models and returns the averaged separation result |
Dispose() |
void | Releases all ONNX sessions and managed resources |
Events
| Event | Type | Description |
|---|---|---|
ProgressChanged |
EventHandler<MultiModelSeparationProgress> | Raised on every chunk, with per-model index, name, chunk count, and overall percentage |
ProcessingCompleted |
EventHandler<MultiModelSeparationResult> | Raised when all models have finished and the averaged result is ready |
Data Classes
MultiModelSeparationOptions
public class MultiModelSeparationOptions
{
// List of models to include in the averaging pipeline
public List<MultiModelInfo> Models { get; set; } = new();
// Output directory for final and intermediate files
public string OutputDirectory { get; set; } = "separated_multimodel";
// Enable GPU acceleration (CUDA on Windows/Linux, CoreML on macOS)
public bool EnableGPU { get; set; } = true;
// Overlap margin in samples between chunks (default: 44100 = 1 s)
public int Margin { get; set; } = 44100;
// Chunk size in seconds (0 = process entire file at once)
public int ChunkSizeSeconds { get; set; } = 15;
// Save individual model outputs to disk (useful for debugging)
public bool SaveAllIntermediateResults { get; set; } = false;
}
MultiModelInfo
public class MultiModelInfo
{
// Human-readable name shown in progress events and output filenames
public string Name { get; set; } = "Model";
// ONNX model file path (leave null/empty to use an embedded InternalModel)
public string? ModelPath { get; set; }
// Embedded model selection (Default, Best, Karaoke, …)
public InternalModel Model { get; set; } = InternalModel.None;
// FFT size for STFT (0 = auto-detect from ONNX metadata)
public int NFft { get; set; } = 6144;
// Time dimension as power of 2 (2^DimT frames)
public int DimT { get; set; } = 8;
// Frequency dimension
public int DimF { get; set; } = 2048;
// Disable noise reduction for this specific model
public bool DisableNoiseReduction { get; set; } = false;
// Save this model's individual output (independent of SaveAllIntermediateResults)
public bool SaveIntermediateOutput { get; set; } = false;
// Explicit output type — leave null for auto-detection
public ModelOutputType? OutputType { get; set; } = null;
}
ModelOutputType Enum
public enum ModelOutputType
{
// Model directly outputs the instrumental track.
// Vocals = Original − Instrumental
Instrumental,
// Model directly outputs the vocal track.
// Instrumental = Original − Vocals
Vocals
}
MultiModelSeparationProgress
public class MultiModelSeparationProgress
{
public string CurrentFile { get; set; } // File being processed
public double OverallProgress { get; set; } // 0–100 %
public string Status { get; set; } // Human-readable step description
public int CurrentModelIndex { get; set; } // 1-based model index
public int TotalModels { get; set; } // Total number of models
public string CurrentModelName { get; set; } // Name of the active model
public int ProcessedChunks { get; set; } // Chunks done for current model
public int TotalChunks { get; set; } // Total chunks for current model
}
MultiModelSeparationResult
public class MultiModelSeparationResult
{
public string OutputPath { get; set; } // Same as InstrumentalPath
public string VocalsPath { get; set; } // Averaged vocals WAV file
public string InstrumentalPath { get; set; } // Averaged instrumental WAV file
public Dictionary<string, string> IntermediatePaths { get; set; } // Per-model outputs
public TimeSpan ProcessingTime { get; set; } // Wall-clock processing duration
public int ModelsProcessed { get; set; } // Number of models that contributed
}
Usage Examples
Simple 2-Model Averaging
using OwnaudioNET.Features.Vocalremover;
// Convenience factory: creates a 2-model averaging pipeline
using var separator = MultiModelExtensions.CreateSimplePipeline(
model1: InternalModel.Best,
model2: InternalModel.Karaoke,
outputDirectory: "output"
);
separator.ProgressChanged += (s, p) =>
Console.WriteLine($"[Model {p.CurrentModelIndex}/{p.TotalModels}] {p.OverallProgress:F1}%");
separator.Initialize();
var result = separator.Separate("song.mp3");
Console.WriteLine($"Vocals: {result.VocalsPath}");
Console.WriteLine($"Instrumental: {result.InstrumentalPath}");
Console.WriteLine($"Time: {result.ProcessingTime}");
Triple-Model Averaging
using var separator = MultiModelExtensions.CreateTriplePipeline(
model1: InternalModel.Best,
model2: InternalModel.Default,
model3: InternalModel.Karaoke,
outputDirectory: "output_triple"
);
separator.Initialize();
var result = separator.Separate("song.flac");
// Final averaged outputs
Console.WriteLine($"Vocals: {result.VocalsPath}");
Console.WriteLine($"Instrumental: {result.InstrumentalPath}");
// Intermediate per-model outputs (saved because the helper enables it)
foreach (var kv in result.IntermediatePaths)
Console.WriteLine($" {kv.Key}: {kv.Value}");
Full Control with Custom Options
var options = new MultiModelSeparationOptions
{
Models = new List<MultiModelInfo>
{
new MultiModelInfo
{
Name = "Step1_BestQuality",
Model = InternalModel.Best,
NFft = 6144, DimT = 8, DimF = 2048,
DisableNoiseReduction = false,
SaveIntermediateOutput = true
},
new MultiModelInfo
{
Name = "Step2_Karaoke",
Model = InternalModel.Karaoke,
NFft = 6144, DimT = 8, DimF = 2048,
DisableNoiseReduction = true
}
},
OutputDirectory = "output_custom",
EnableGPU = true,
ChunkSizeSeconds = 15,
Margin = 44100,
SaveAllIntermediateResults = true
};
using var separator = new MultiModelAudioSeparator(options);
separator.ProgressChanged += (s, p) =>
{
Console.Write($"\r[{p.CurrentModelName}] chunk {p.ProcessedChunks}/{p.TotalChunks} ({p.OverallProgress:F1}%)");
};
separator.Initialize();
var result = separator.Separate("input.wav");
Mixed OutputType — Vocal + Instrumental Models
var options = new MultiModelSeparationOptions
{
Models = new List<MultiModelInfo>
{
new MultiModelInfo
{
Name = "VocalModel",
ModelPath = @"models/vocal_model.onnx",
OutputType = ModelOutputType.Vocals // outputs vocal track directly
},
new MultiModelInfo
{
Name = "InstrumentalModel",
ModelPath = @"models/instrumental_model.onnx",
OutputType = ModelOutputType.Instrumental // outputs instrumental track directly
}
},
OutputDirectory = "output_mixed",
EnableGPU = true
};
// The averaging pipeline:
// - Vocal complement from model 2: original − instrumental
// - Instrumental complement from model 1: original − vocals
// - Final vocals: (V₁ + V₂) / 2
// - Final instrumental: (I₁ + I₂) / 2
using var separator = new MultiModelAudioSeparator(options);
separator.Initialize();
var result = separator.Separate("song.mp3");
Custom ONNX Model Files with Auto-Detection
// OutputType is inferred from model filename:
// "Voc_FT.onnx" → contains "Voc" → Vocals
// "Inst_HQ.onnx" → contains "Inst" → Instrumental
var options = new MultiModelSeparationOptions
{
Models = new List<MultiModelInfo>
{
new MultiModelInfo { Name = "Voc_FT", ModelPath = @"models/Voc_FT.onnx" },
new MultiModelInfo { Name = "Inst_HQ", ModelPath = @"models/Inst_HQ.onnx" }
},
OutputDirectory = "output_custom_files",
EnableGPU = true
};
using var separator = new MultiModelAudioSeparator(options);
separator.Initialize();
var result = separator.Separate("song.wav");
Auto-Detection of OutputType
When MultiModelInfo.OutputType is left as null, the system uses a
three-strategy detection approach:
- ONNX output metadata: Inspects the model's output node names for keywords such as vocal, voice, instrumental, karaoke, etc.
- Model name / file path: Scans
MultiModelInfo.NameandModelPathfor the same keywords. - InternalModel enum name: Falls back to the enum value name.
If none of the strategies yields a conclusive result, Instrumental is used as the safe
default.
Advanced Configuration
Processing Parameters
Fine-tune the separation process with advanced STFT and chunking parameters:
| Parameter | Default | Description |
|---|---|---|
ChunkSizeSeconds |
15 | Length of each processing chunk in seconds. Smaller values use less memory but increase processing overhead. |
Margin |
44100 | Overlap margin in samples between chunks. Prevents artifacts at chunk boundaries. Should be at least 0.5 seconds. |
NFft |
6144 | FFT size for spectral analysis. Higher values provide better frequency resolution but increase computation time. |
DimF |
2048 | Frequency dimension for model input. Auto-detected from model metadata if not specified. |
DimT |
8 | Time dimension as power of 2 (2^8 = 256 time frames). Auto-detected from model metadata if not specified. |
DisableNoiseReduction |
false | When false, applies advanced noise reduction using phase inversion technique for cleaner results. |
Hardware Acceleration
The service automatically detects and utilizes available hardware acceleration:
// Automatic GPU detection during initialization
service.Initialize();
// Console output will show:
// "CUDA execution provider enabled." (if GPU available)
// OR
// "Using CPU execution provider." (fallback)
// Processing speed comparison:
// CPU: ~2-5x real-time (depends on CPU cores)
// GPU (CUDA): ~10-30x real-time (depends on GPU model)
Memory Management
For large files or systems with limited memory, adjust chunk size and margin:
// Low memory configuration (suitable for 4GB RAM)
var lowMemOptions = new SimpleSeparationOptions
{
Model = InternalModel.Default,
ChunkSizeSeconds = 10, // Smaller chunks
Margin = 22050 // 0.5 second margin
};
// High quality configuration (requires 8GB+ RAM)
var highQualityOptions = new SimpleSeparationOptions
{
Model = InternalModel.Best,
ChunkSizeSeconds = 30, // Larger chunks for better quality
Margin = 88200 // 2 second margin for smoother transitions
};
Progress Tracking
Monitoring Progress
Track separation progress in real-time using event handlers:
var service = new SimpleAudioSeparationService(options);
// Subscribe to progress events
service.ProgressChanged += (sender, progress) =>
{
Console.WriteLine($"[{progress.OverallProgress:F1}%] {progress.Status}");
if (progress.TotalChunks > 0)
{
Console.WriteLine($"Chunk {progress.ProcessedChunks}/{progress.TotalChunks}");
}
};
// Subscribe to completion event
service.ProcessingCompleted += (sender, result) =>
{
Console.WriteLine($"\nSeparation completed in {result.ProcessingTime}");
Console.WriteLine($"Vocals saved to: {result.VocalsPath}");
Console.WriteLine($"Instrumental saved to: {result.InstrumentalPath}");
};
service.Initialize();
service.Separate("song.mp3");
Progress Stages
- Loading audio file (0%): Decoding input audio to 44.1kHz stereo format
- Processing audio separation (10%): Creating and processing audio chunks
- Processing chunks (20-80%): Running neural network inference on each chunk
- Calculating results (90%): Reconstructing full-length separated tracks
- Completed (100%): Saving output files to disk
Factory Methods
AudioSeparationExtensions
Convenient factory methods for quick service creation:
using OwnaudioNET.Features.Vocalremover;
// Create service with internal model
var service1 = AudioSeparationExtensions.CreateDefaultService(InternalModel.Best);
// Create service with custom model file
var service2 = AudioSeparationExtensions.CreateDefaultService("path/to/model.onnx");
// Create service with custom output directory
var service3 = AudioSeparationExtensions.CreatetService(
InternalModel.Karaoke,
@"C:\Output\Karaoke"
);
// Create service with custom model and output
var service4 = AudioSeparationExtensions.CreatetService(
"custom_model.onnx",
@"C:\Output\Custom"
);
SimpleSeparator Factory
Simplified factory for quick one-line initialization:
// Quick initialization and separation
var (service, _, _) = SimpleSeparator.Separator(
InternalModel.Default,
"output_folder"
);
var result = service.Separate("input.wav");
service.Dispose();
Helper Methods
// Validate audio file format
bool isValid = AudioSeparationExtensions.IsValidAudioFile("song.mp3");
// Supports: .wav, .mp3, .flac
// Estimate processing time
TimeSpan estimate = AudioSeparationExtensions.EstimateProcessingTime("song.wav");
Console.WriteLine($"Estimated processing time: {estimate}");
Usage Examples
Basic Vocal Removal
using OwnaudioNET.Features.Vocalremover;
// Create service with default settings
var options = new SimpleSeparationOptions
{
Model = InternalModel.Default,
OutputDirectory = "separated"
};
using var service = new SimpleAudioSeparationService(options);
service.Initialize();
// Separate audio file
var result = service.Separate(@"C:\Music\song.mp3");
Console.WriteLine($"Vocals: {result.VocalsPath}");
Console.WriteLine($"Instrumental: {result.InstrumentalPath}");
Console.WriteLine($"Time: {result.ProcessingTime.TotalSeconds:F1}s");
High-Quality Separation with Progress
var options = new SimpleSeparationOptions
{
Model = InternalModel.Best,
OutputDirectory = "output_best",
ChunkSizeSeconds = 20,
DisableNoiseReduction = false // Enable noise reduction
};
using var service = new SimpleAudioSeparationService(options);
// Progress tracking
service.ProgressChanged += (s, p) =>
{
Console.Write($"\r[{p.OverallProgress:F1}%] {p.Status}");
if (p.TotalChunks > 0)
Console.Write($" - Chunk {p.ProcessedChunks}/{p.TotalChunks}");
};
service.ProcessingCompleted += (s, r) =>
{
Console.WriteLine($"\n\nCompleted in {r.ProcessingTime}");
Console.WriteLine($"Output files:\n {r.VocalsPath}\n {r.InstrumentalPath}");
};
service.Initialize();
service.Separate("input_song.flac");
Karaoke Track Creation
// Use Karaoke model to preserve background vocals
var options = new SimpleSeparationOptions
{
Model = InternalModel.Karaoke,
OutputDirectory = "karaoke_tracks"
};
using var service = new SimpleAudioSeparationService(options);
service.Initialize();
// Process multiple songs
string[] songs = Directory.GetFiles(@"C:\Music\Album", "*.mp3");
foreach (var song in songs)
{
Console.WriteLine($"\nProcessing: {Path.GetFileName(song)}");
var result = service.Separate(song);
Console.WriteLine($"Karaoke track created: {result.InstrumentalPath}");
}
Batch Processing with Factory Method
// Create service using factory
var service = AudioSeparationExtensions.CreatetService(
InternalModel.Default,
@"C:\Output\Vocals_Removed"
);
service.Initialize();
// Process all WAV files in directory
var files = Directory.GetFiles(@"C:\Music", "*.wav");
int count = 0;
foreach (var file in files)
{
try
{
Console.WriteLine($"\n[{++count}/{files.Length}] {Path.GetFileName(file)}");
var result = service.Separate(file);
Console.WriteLine($"✓ Completed in {result.ProcessingTime.TotalSeconds:F1}s");
}
catch (Exception ex)
{
Console.WriteLine($"✗ Error: {ex.Message}");
}
}
service.Dispose();
Custom Model Configuration
// Load custom trained model
var options = new SimpleSeparationOptions
{
Model = InternalModel.None,
ModelPath = @"C:\Models\my_custom_separator.onnx",
OutputDirectory = "custom_output",
// Adjust STFT parameters if needed for custom model
NFft = 4096,
DimF = 1024,
DimT = 9 // 2^9 = 512 time frames
};
using var service = new SimpleAudioSeparationService(options);
service.Initialize();
// Model parameters are auto-detected from ONNX metadata
var result = service.Separate("test.wav");
- Model choice: Best model is 2-3x slower than Default
- Hardware: GPU acceleration provides 5-15x speedup over CPU
- File length: Processing time scales linearly with audio duration
- Chunk size: Larger chunks are more efficient but use more memory
- STFT with Hanning window for accurate frequency-domain representation
- Reflection padding to prevent boundary artifacts
- Hermitian symmetry for proper inverse FFT reconstruction
- Overlap-add synthesis with automatic windowing compensation
- Phase inversion noise reduction for cleaner separation results
- Automatic normalization to prevent clipping in output files
- 44.1kHz stereo processing for optimal quality
- WAV (uncompressed PCM)
- MP3 (MPEG Audio Layer 3)
- FLAC (Free Lossless Audio Codec)