人脸识别 | Soulmate

原理 #

insightface==0.7.3

人脸识别的技术本质是把一张人脸转换为一个在高维空间中可比较的向量，并以几何度量（通常是余弦相似度）来判断两张脸是否属于同一身份。

工程实现中，人脸识别分为三层：检测与对齐、特征表示学习（Embedding）、相似度度量与决策。

检测与对齐 #

检测器职责是从任意图像中找到人脸的边界框 bbox 。
检测模型通常采用单阶段（如 YOLO 系列）或人脸专用检测器（如 SCRFD、RetinaFace）。这类模型通过密集或稀疏锚点预测人脸框和分类得分，或者采用无锚点（anchor-free）的中心点方案。 det_size 决定输入分辨率，影响检测的召回率与速度。
关键点与对齐：成熟的人脸识别系统会用 5 点（眼角、鼻尖、嘴角）或更多关键点进行仿射对齐，将人脸标准化到统一视角与尺度（如 112×112）。InsightFace 的管线在内部完成该对齐，有助于抵抗姿态和光照变化，提高嵌入稳定性与跨域泛化。

图像预处理

OpenCV 默认 BGR 通道，InsightFace 内部会完成必要的颜色空间转换与归一化。若以对齐后裁剪的人脸作为识别输入，通常会进行归一化到 [−1, 1] 或以 ImageNet 统计量归一化，确保网络对亮度和色彩变化不敏感。

特征表示（Embedding） #

人脸识别的核心是学习一个“身份可分离”的向量空间。InsightFace 使用的识别网络常见为 ArcFace 家族（如 iresnet50/100），其训练目标是让同一身份的嵌入在角度空间上彼此靠近，不同身份远离。
ArcFace 的关键思想是“加性角度间隔（Additive Angular Margin, AAM-Softmax）”。设归一化后的特征向量为 (x)，类别中心向量为 (W)，二者的夹角为 (\theta)。ArcFace 在正类上将对数值从 (\cos(\theta)) 改为 (\cos(\theta + m))，其中 (m) 是角度间隔；整体缩放为 (s)：
- 正类对数值：(\ s \cdot \cos(\theta + m))
- 负类对数值：(\ s \cdot \cos(\theta))
这种在角度空间直接施加间隔的做法使得类间分割边界更清晰，训练收敛更稳定。训练完成后，网络输出的嵌入（一般维度为 512）在 L2 归一化后落在单位球面上，使得几何度量（特别是余弦相似度）与“身份相似性”高度一致。
InsightFace通常会给出 normed_embedding （已归一化）；若取的是 embedding ，上层相似度函数会再次进行范数归一化，从而保证度量一致性。

相似度度量与决策 #

相似度度量的主流选择是余弦相似度。
- 将向量强制转换为 float32 ；
- 对两向量做 L2 范数归一化；
- 用点积除以范数乘积得到 (\cos\theta)。
余弦相似度的性质：
- 当嵌入已 L2 归一化时，余弦相似度与欧氏距离单调相关；
- 取值范围 ([-1, 1])，但人脸嵌入的实际分布通常集中在 ([0, 1]) 内，越接近 1 表示越相似。
决策阈值：阈值的设定应基于验证数据的统计与业务风险偏好：
- 提高阈值可降低误接受率（FAR），但可能上升误拒绝率（FRR）。
- 降低阈值可提高召回与检索效果，但误识别风险增加。
在工程落地中，常用 ROC/DET 曲线或 EER（Equal Error Rate）作为标定依据。也可以引入多帧一致性策略：连续 N 帧相似度超过阈值才认定为同一人，从而显著降低瞬时误报。

InsightFace调用代码 #

import insightface
from .base import BaseModel
from typing import Union
import numpy as np
import cv2


class InsightFaceModel(BaseModel):
    def __init__(self, model_path, classes):
        super().__init__(model_path, classes)
        model = insightface.app.FaceAnalysis(root=model_path)
        model.prepare(ctx_id=0, det_size=(640, 640))
        self.model = model

    def _preprocess(self, data: Union[str, bytes, np.ndarray]):
        if isinstance(data, np.ndarray):
            return data
        if isinstance(data, bytes):
            np_arr = np.frombuffer(data, np.uint8)
            img = cv2.imdecode(np_arr, cv2.IMREAD_COLOR)
            return img
        if isinstance(data, str):
            img = cv2.imread(data, cv2.IMREAD_COLOR)
            return img
        raise Exception("未知的数据类型")

    def inference(self, data):
        img_data = self._preprocess(data)
        detections = self.model.get(img_data)
        res = self._postprocess(detections)
        return res

    def _postprocess(self, detections):
        res = []
        for detection in detections:
            item = {
                "class": self.classes[0]['name'],
                "age": detection['age'],
                "bbox": detection['bbox'].tolist(),
                "sex": detection.sex,
                "confidence": float(detection['det_score']),
                "embedding": detection['embedding'].tolist(),
            }
            res.append(item)
        return res

测试函数 #

import json
import os
import cv2
import numpy as np
import app.cv as cv_mod
from app.services.frame import _read_video_interval_frames
import sys

def _draw(frame, dets):
    for d in dets:
        if d.get("class") == "head":
            continue
        x1, y1, x2, y2 = [int(v) for v in d["bbox"]]
        color = (255, 255, 0)
        text_color = (0, 0, 255)
        cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
        label = f"match {d.get('match_score', d.get('confidence', 0)):.2f}"
        cv2.putText(frame, label, (x1, max(0, y1 - 5)), cv2.FONT_HERSHEY_SIMPLEX, 0.6, text_color, 2)
    return frame

def _cosine(a, b):
    a = np.asarray(a, dtype=np.float32)
    b = np.asarray(b, dtype=np.float32)
    if a.size == 0 or b.size == 0:
        return 0.0
    na = np.linalg.norm(a)
    nb = np.linalg.norm(b)
    if na == 0 or nb == 0:
        return 0.0
    return float(np.dot(a, b) / (na * nb))

def main():
    video_path = r"E:\镜像测试\公交车1.mp4"
    model_root = r"C:\Users\xxx\AppData\Roaming\FirmamentAIEngine\models\face-detection"
    model_name = "face-detection"
    track=False
    start_ms = 0
    end_ms = sys.maxsize
    display_delay_ms = 1
    if not os.path.exists(video_path):
        print(f"视频文件不存在: {video_path}")
        return

    cv_mod.init_model(model_root, model_name)
    from app.cv import CV_MODEL
    from app.cv.model_pool import ModelPool
    has_model_pool = isinstance(CV_MODEL, ModelPool)

    if has_model_pool:
        try:
            model_instance = cv_mod.CV_MODEL.get_model()
        except TimeoutError:
            print("系统资源不足，请稍后再试")
            return
    else:
        from app.cv import CV_MODEL
        try:
            model_instance = CV_MODEL
        except TimeoutError:
            print("The system resources are insufficient, please try again later.")
            return


    target_path=r"E:\镜像测试\face_target2.png"
    target=model_instance.inference(target_path)
    target_vec = None
    best_conf = -1.0
    for d in target:
        emb = d.get('embedding')
        conf = float(d.get('confidence', 0.0))
        if isinstance(emb, list) and len(emb) > 0 and conf > best_conf:
            target_vec = np.asarray(emb, dtype=np.float32)
            best_conf = conf
    if target_vec is None:
        print("目标图像未获取到有效人脸特征向量")
        return

    thresh = 0.4
    for frame, idx, pts in _read_video_interval_frames(video_path, 0,start_ms, end_ms):
        det = model_instance.inference(frame)
        matches = []
        for d in det:
            emb = d.get('embedding')
            if isinstance(emb, list) and len(emb) > 0:
                sim = _cosine(target_vec, emb)
                if sim >= thresh:
                    matches.append({
                        "class": d.get("class", "face"),
                        "bbox": d["bbox"],
                        "confidence": d.get("confidence", 0.0),
                        "match_score": sim
                    })
        out = {"frame_index": idx, "timestamp_ms": pts, "detections": matches}
        print(json.dumps(out, ensure_ascii=False))
        # vis = _draw(frame.copy(), det)
        vis = _draw(frame.copy(), matches)
        cv2.imshow("YOLO-Track", vis)
    cv2.destroyAllWindows()
    from app.cv.model_pool import ModelPool
    if isinstance(cv_mod.CV_MODEL, ModelPool):
        cv_mod.CV_MODEL.return_model(model_instance)

if __name__ == "__main__":
    main()