Research

We build practical, theory-grounded interaction-based explanation methods to understand what deep neural networks learn and how they learn it. Our work treats feature interactions as a unifying language for interpretability, connecting attribution, learning dynamics, generalization, and robustness. By combining rigorous interaction theory with scalable monitoring and real-world validation, we help teams explain, audit and debug models.

Attribution

We develop principled attribution methods that explain predictions by allocating credit to inputs and their combinations, unifying majority post-hoc explainers and improving Shapley-style efficiency and faithfulness.

Attribution

Interaction Theory

We formalize and extract interaction primitives, which are sparse, reusable building blocks that capture compositional concepts, providing a compact, interpretable view of how networks represent knowledge.

Interaction Theory

Robustness

We analyze why models fail under adversarial pressure, distribution shift, or fine-tuning, and propose tools to assess and improve stability of learned concepts and fingerprints.

Robustness

We analyze why models fail under adversarial pressure, distribution shift, or fine-tuning, and propose tools to assess and improve stability of learned concepts and fingerprints.

Applications

We translate these ideas into practice across vision and 3D understanding, privacy/attribute obfuscation, and LLM behavior auditing (e.g., legal reasoning), demonstrating impact in real deployment scenarios.

Applications

Publications

All

Attribution

Interaciton Theroy

Learning Dynamics

Robutness

Applications

2026

Attribution Explanations for Deep Neural Networks: A Theoretical Perspective

2026

IEEE TPAMI

Attribution

Can LLMs Reason Soundly in Law? Auditing Inference Patterns for Legal Judgment

2026

IEEE TPAMI

Application

2025

Interpretable Rotation-Equivariant Multiary-Valued Network for Attribute Obfuscation

2025

IEEE TPAMI

Application

Towards the Resistance of Neural Network Fingerprinting to Fine-tuning

2025

NeurIPS

Robustness

2024

Unifying Fourteen Post-Hoc Attribution Methods With Taylor Interactions

2024

IEEE TPAMI

Attribution

Where We Have Arrived in Proving the Emerence of Sparse Interaction Primitives in DNNs

2024

ICLR

Interaction Theory

Defining and extracting generalizable interaction primitives from DNNs

2024

ICLR

Interaction Theory

Identifying Semantic Induction Heads to Understand In-Context Learning

2024

ACL Findings

Application

Interpretability of Neural Networks Based on Game-theoretic Interactions

2024

Machine Intelligence Research

Interaction Theory

Clarifying the behavior and the difficulty of adversarial training

2024

AAAI

Robustness

Explaining generalization power of a DNN using interactive concepts

2024

AAAI

Interaction Theory

Batch normalization is blind to the first and second derivatives of the loss

2024

AAAI

Interaction Theory

2023

Towards the difficulty for a deep neural network to learn concepts of different complexities

2023

NeurIPS

Learning dynamics

Does a neural network really encode symbolic concepts?

2023

ICML

Interaction Theory

Harsanyinet: Computing accurate Shapley values in a single forward propagation

2023

ICML

Attribution

Can We Faithfully Represent Absence States to Compute Shapley Values on a DNN?

2023

ICLR

Attribution

Bayesian Neural Networks Avoid Encoding Complex and Perturbation-Sensitive Concepts

2023

ICML

Robustness

Defining and Quantifying the Emergence of Sparse Concepts in DNNs

2023

CVPR

Interaction Theory

All

Attribution

Interaciton Theroy

Learning Dynamics

Robutness

Applications

2026

Attribution Explanations for Deep Neural Networks: A Theoretical Perspective

2026

IEEE TPAMI

Attribution

Can LLMs Reason Soundly in Law? Auditing Inference Patterns for Legal Judgment

2026

IEEE TPAMI

Application

2025

Interpretable Rotation-Equivariant Multiary-Valued Network for Attribute Obfuscation

2025

IEEE TPAMI

Application

Towards the Resistance of Neural Network Fingerprinting to Fine-tuning

2025

NeurIPS

Robustness

2024

Unifying Fourteen Post-Hoc Attribution Methods With Taylor Interactions

2024

IEEE TPAMI

Attribution

Where We Have Arrived in Proving the Emerence of Sparse Interaction Primitives in DNNs

2024

ICLR

Interaction Theory

Defining and extracting generalizable interaction primitives from DNNs

2024

ICLR

Interaction Theory

Identifying Semantic Induction Heads to Understand In-Context Learning

2024

ACL Findings

Application

Interpretability of Neural Networks Based on Game-theoretic Interactions

2024

Machine Intelligence Research

Interaction Theory

Clarifying the behavior and the difficulty of adversarial training

2024

AAAI

Robustness

Explaining generalization power of a DNN using interactive concepts

2024

AAAI

Interaction Theory

Batch normalization is blind to the first and second derivatives of the loss

2024

AAAI

Interaction Theory

2023

Towards the difficulty for a deep neural network to learn concepts of different complexities

2023

NeurIPS

Learning dynamics

Does a neural network really encode symbolic concepts?

2023

ICML

Interaction Theory

Harsanyinet: Computing accurate Shapley values in a single forward propagation

2023

ICML

Attribution

Can We Faithfully Represent Absence States to Compute Shapley Values on a DNN?

2023

ICLR

Attribution

Bayesian Neural Networks Avoid Encoding Complex and Perturbation-Sensitive Concepts

2023

ICML

Robustness

Defining and Quantifying the Emergence of Sparse Concepts in DNNs

2023

CVPR

Interaction Theory

All

Attribution

Interaciton Theroy

Learning Dynamics

Robutness

Applications

2026

Attribution Explanations for Deep Neural Networks: A Theoretical Perspective

2026

IEEE TPAMI

Attribution

Can LLMs Reason Soundly in Law? Auditing Inference Patterns for Legal Judgment

2026

IEEE TPAMI

Application

2025

Interpretable Rotation-Equivariant Multiary-Valued Network for Attribute Obfuscation

2025

IEEE TPAMI

Application

Towards the Resistance of Neural Network Fingerprinting to Fine-tuning

2025

NeurIPS

Robustness

2024

Unifying Fourteen Post-Hoc Attribution Methods With Taylor Interactions

2024

IEEE TPAMI

Attribution

Where We Have Arrived in Proving the Emerence of Sparse Interaction Primitives in DNNs

2024

ICLR

Interaction Theory

Defining and extracting generalizable interaction primitives from DNNs

2024

ICLR

Interaction Theory

Identifying Semantic Induction Heads to Understand In-Context Learning

2024

ACL Findings

Application

Interpretability of Neural Networks Based on Game-theoretic Interactions

2024

Machine Intelligence Research

Interaction Theory

Clarifying the behavior and the difficulty of adversarial training

2024

AAAI

Robustness

Explaining generalization power of a DNN using interactive concepts

2024

AAAI

Interaction Theory

Batch normalization is blind to the first and second derivatives of the loss

2024

AAAI

Interaction Theory

2023

Towards the difficulty for a deep neural network to learn concepts of different complexities

2023

NeurIPS

Learning dynamics

Does a neural network really encode symbolic concepts?

2023

ICML

Interaction Theory

Harsanyinet: Computing accurate Shapley values in a single forward propagation

2023

ICML

Attribution

Can We Faithfully Represent Absence States to Compute Shapley Values on a DNN?

2023

ICLR

Attribution

Bayesian Neural Networks Avoid Encoding Complex and Perturbation-Sensitive Concepts

2023

ICML

Robustness

Defining and Quantifying the Emergence of Sparse Concepts in DNNs

2023

CVPR

Interaction Theory