ML Engineer (Mid-Level) – Computer Vision, Video Understanding

Role Overview

You will implement and productionize modern CV/ML pipelines across two tracks: (1) video understanding with personalization and synthetic data generation, and (2) real-time multi-camera detection/tracking with dataset auto-labeling. The focus is practical – training and evaluation, data quality loops, and inference performance.You take a module from prototype to stable pipeline with clear metrics.

Key Projects

Track A – Video understanding, keypoints, personalization, synthetic data
Core stack centers on video transformers and/or 3D CNN baselines plus pose/hand keypoint pipelines (TimeSformer, MViT, VideoMAE; MediaPipe Hands; MMPose/RTMPose; ViTPose).
Personalization is implemented via parameter-efficient adaptation (e.g., LoRA/adapters) and signer-specific calibration layers on top of a shared backbone.
Synthetic data pipeline leverages diffusion-based video generation frameworks (Stable Video Diffusion, AnimateDiff, VideoCrafter2, CogVideoX) to expand edge cases and improve generalization.

Track B – Real-time detection/tracking, optimization, auto-labeling
Primary detection families target real-time accuracy/latency tradeoffs (YOLOv10; RT-DETR.
Tracking and temporal association uses modern MOT baselines (ByteTrack, BoT-SORT).
Auto-labeling/bootstrapping relies on open-vocabulary detection + promptable segmentation pipelines.

Responsibilities

Build training/inference pipelines for video understanding using transformer backbones (TimeSformer, MViT, VideoMAE) and/or 3D CNN baselines as needed.

Implement keypoint-driven features and models using Google MediaPipe Hands and OpenMMLab MMPose (RTMPose, ViTPose) to improve robustness under viewpoint/lighting variation.

Implement personalization layers (LoRA/adapters, calibration heads) and evaluation protocols for user-specific performance without catastrophic forgetting.

Design and run synthetic data generation loops using diffusion video models with strict QA gates (artifact detection, distribution checks).

Train, evaluate, and optimize real-time detectors (YOLO, RT-DETR), including ablations and latency profiling.

Implement multi-object tracking pipelines and tune association logic for occlusions and crowded scenes.

Build auto-labeling workflows using Grounding DINO + SAM, including human-in-the-loop review and active learning sampling.

Ship reproducible experiments (configs, seeds, dataset versions), write tests for data/model logic, and document failure modes and monitoring signals.

Required Qualifications

3-5 years of practical ML/CV engineering experience with at least one production or production-like pipeline shipped.

Strong Python + PyTorch (preferred) or TensorFlow; ability to write and debug custom training/eval loops.

Hands-on experience with video modeling families (at least one of TimeSformer/MViT/VideoMAE-style pipelines).

Experience with real-time detection architectures (YOLO and/or RT-DETR class systems).

Familiarity with pose/hand keypoints tooling (MediaPipe Hands and/or MMPose/RTMPose/ViTPose).
Solid understanding of training mechanics: augmentation, optimization, mixed precision (FP16), metric-driven iteration.

Inference and deployment fundamentals: exporting (ONNX), GPU acceleration (TensorRT/CUDA/OpenVINO) and profiling.

Comfort with Linux + Docker-based workflows.

Solid data engineering hygiene: dataset QC, train/val split discipline, reproducibility, and experiment tracking.

Nice to Have

Experience with open-vocabulary detection + segmentation for labeling/bootstrapping (Grounding DINO , SAM).

Experience with video diffusion models for synthetic data generation (Stable Video Diffusion, AnimateDiff, VideoCrafter2, CogVideoX).

OCR/label-reading components when needed (e.g., PaddleOCR).

Working conditions:

We provide an inspiring working environment where our employees feel rewarded and engaged.
We expect a lot from our employees and are ready to give a lot in return. You’ll be faced with challenging, varied, non-standard projects and tasks. But at the end of the day, you’ll be proud of what you’ve done.
We strongly encourage the growth and development of our team. It is in your best interest to learn new languages and technologies and to implement them into existing and new projects. It won’t be unattended, and we will definitely reward you.
We pay a lot of attention to the health of our employees, so we offer comprehensive health insurance that also covers dental services. So drink tea with ginger and lemon, we have it year-round in the office kitchen.
Softarex Technologies treats each employee individually. Our HR team helps newcomers at every stage of adaptation in the company. No less attention is paid to employees who feel at home here (they literally have their own slippers).

Of course, that’s not all. Check out the full benefits package here and let’s get started!