研究

我们构建实用、以理论为基础的交互（interaction）驱动解释方法，用于理解深度神经网络学到了什么以及如何学习。我们的研究将特征交互视为可解释性的统一语言，贯通归因分析、学习动态、泛化能力与鲁棒性。通过将严谨的交互理论与可扩展的在线监测和真实场景验证相结合，我们帮助团队解释、审计，乃至调试模型。

归因

我们开发有理论保证的归因方法，通过为输入特征及其组合分配“贡献度”来解释模型预测，统一多种后验解释器，并提升 Shapley 风格方法的效率与忠实性。

归因

交互理论

我们形式化并提取交互“原语”（interaction primitives），一种稀疏、可复用的组合式知识构件，用更紧凑、可解释的方式呈现网络如何表示概念与知识。

交互理论

我们形式化并提取交互“原语”（interaction primitives），一种稀疏、可复用的组合式知识构件，用更紧凑、可解释的方式呈现网络如何表示概念与知识。

鲁棒性

我们分析模型在对抗扰动、分布偏移或微调后失效的原因，并提出评估与改进工具，用于增强已学习概念与“指纹”的稳定性与可迁移性。

鲁棒性

我们分析模型在对抗扰动、分布偏移或微调后失效的原因，并提出评估与改进工具，用于增强已学习概念与“指纹”的稳定性与可迁移性。

应用

我们将上述方法落地于计算机视觉与 3D 理解、隐私保护/属性混淆，以及大模型行为审计（如法律推理）等场景，在真实部署中验证方法的效果与价值。

应用

我们将上述方法落地于计算机视觉与 3D 理解、隐私保护/属性混淆，以及大模型行为审计（如法律推理）等场景，在真实部署中验证方法的效果与价值。

Publications

All

Attribution

Interaciton Theroy

Learning Dynamics

Robutness

Applications

2026

Attribution Explanations for Deep Neural Networks: A Theoretical Perspective

2026

IEEE TPAMI

Attribution

Can LLMs Reason Soundly in Law? Auditing Inference Patterns for Legal Judgment

2026

IEEE TPAMI

Application

2025

Interpretable Rotation-Equivariant Multiary-Valued Network for Attribute Obfuscation

2025

IEEE TPAMI

Application

Towards the Resistance of Neural Network Fingerprinting to Fine-tuning

2025

NeurIPS

Robustness

2024

Unifying Fourteen Post-Hoc Attribution Methods With Taylor Interactions

2024

IEEE TPAMI

Attribution

Where We Have Arrived in Proving the Emerence of Sparse Interaction Primitives in DNNs

2024

ICLR

Interaction Theory

Defining and extracting generalizable interaction primitives from DNNs

2024

ICLR

Interaction Theory

Identifying Semantic Induction Heads to Understand In-Context Learning

2024

ACL Findings

Application

Interpretability of Neural Networks Based on Game-theoretic Interactions

2024

Machine Intelligence Research

Interaction Theory

Clarifying the behavior and the difficulty of adversarial training

2024

AAAI

Robustness

Explaining generalization power of a DNN using interactive concepts

2024

AAAI

Interaction Theory

Batch normalization is blind to the first and second derivatives of the loss

2024

AAAI

Interaction Theory

2023

Towards the difficulty for a deep neural network to learn concepts of different complexities

2023

NeurIPS

Learning dynamics

Does a neural network really encode symbolic concepts?

2023

ICML

Interaction Theory

Harsanyinet: Computing accurate Shapley values in a single forward propagation

2023

ICML

Attribution

Can We Faithfully Represent Absence States to Compute Shapley Values on a DNN?

2023

ICLR

Attribution

Bayesian Neural Networks Avoid Encoding Complex and Perturbation-Sensitive Concepts

2023

ICML

Robustness

Defining and Quantifying the Emergence of Sparse Concepts in DNNs

2023

CVPR

Interaction Theory

All

Attribution

Interaciton Theroy

Learning Dynamics

Robutness

Applications

2026

Attribution Explanations for Deep Neural Networks: A Theoretical Perspective

2026

IEEE TPAMI

Attribution

Can LLMs Reason Soundly in Law? Auditing Inference Patterns for Legal Judgment

2026

IEEE TPAMI

Application

2025

Interpretable Rotation-Equivariant Multiary-Valued Network for Attribute Obfuscation

2025

IEEE TPAMI

Application

Towards the Resistance of Neural Network Fingerprinting to Fine-tuning

2025

NeurIPS

Robustness

2024

Unifying Fourteen Post-Hoc Attribution Methods With Taylor Interactions

2024

IEEE TPAMI

Attribution

Where We Have Arrived in Proving the Emerence of Sparse Interaction Primitives in DNNs

2024

ICLR

Interaction Theory

Defining and extracting generalizable interaction primitives from DNNs

2024

ICLR

Interaction Theory

Identifying Semantic Induction Heads to Understand In-Context Learning

2024

ACL Findings

Application

Interpretability of Neural Networks Based on Game-theoretic Interactions

2024

Machine Intelligence Research

Interaction Theory

Clarifying the behavior and the difficulty of adversarial training

2024

AAAI

Robustness

Explaining generalization power of a DNN using interactive concepts

2024

AAAI

Interaction Theory

Batch normalization is blind to the first and second derivatives of the loss

2024

AAAI

Interaction Theory

2023

Towards the difficulty for a deep neural network to learn concepts of different complexities

2023

NeurIPS

Learning dynamics

Does a neural network really encode symbolic concepts?

2023

ICML

Interaction Theory

Harsanyinet: Computing accurate Shapley values in a single forward propagation

2023

ICML

Attribution

Can We Faithfully Represent Absence States to Compute Shapley Values on a DNN?

2023

ICLR

Attribution

Bayesian Neural Networks Avoid Encoding Complex and Perturbation-Sensitive Concepts

2023

ICML

Robustness

Defining and Quantifying the Emergence of Sparse Concepts in DNNs

2023

CVPR

Interaction Theory

All

Attribution

Interaciton Theroy

Learning Dynamics

Robutness

Applications

2026

Attribution Explanations for Deep Neural Networks: A Theoretical Perspective

2026

IEEE TPAMI

Attribution

Can LLMs Reason Soundly in Law? Auditing Inference Patterns for Legal Judgment

2026

IEEE TPAMI

Application

2025

Interpretable Rotation-Equivariant Multiary-Valued Network for Attribute Obfuscation

2025

IEEE TPAMI

Application

Towards the Resistance of Neural Network Fingerprinting to Fine-tuning

2025

NeurIPS

Robustness

2024

Unifying Fourteen Post-Hoc Attribution Methods With Taylor Interactions

2024

IEEE TPAMI

Attribution

Where We Have Arrived in Proving the Emerence of Sparse Interaction Primitives in DNNs

2024

ICLR

Interaction Theory

Defining and extracting generalizable interaction primitives from DNNs

2024

ICLR

Interaction Theory

Identifying Semantic Induction Heads to Understand In-Context Learning

2024

ACL Findings

Application

Interpretability of Neural Networks Based on Game-theoretic Interactions

2024

Machine Intelligence Research

Interaction Theory

Clarifying the behavior and the difficulty of adversarial training

2024

AAAI

Robustness

Explaining generalization power of a DNN using interactive concepts

2024

AAAI

Interaction Theory

Batch normalization is blind to the first and second derivatives of the loss

2024

AAAI

Interaction Theory

2023

Towards the difficulty for a deep neural network to learn concepts of different complexities

2023

NeurIPS

Learning dynamics

Does a neural network really encode symbolic concepts?

2023

ICML

Interaction Theory

Harsanyinet: Computing accurate Shapley values in a single forward propagation

2023

ICML

Attribution

Can We Faithfully Represent Absence States to Compute Shapley Values on a DNN?

2023

ICLR

Attribution

Bayesian Neural Networks Avoid Encoding Complex and Perturbation-Sensitive Concepts

2023

ICML

Robustness

Defining and Quantifying the Emergence of Sparse Concepts in DNNs

2023

CVPR

Interaction Theory