Seeing Touch from Motion

Tactile Motion Correlation (TMC). Fine-grained contact states such as making contact and releasing contact look nearly identical in raw tactile images and even in cumulative motion. By correlating transient and cumulative motion (same vs. opposite direction), TMC explicitly resolves this ambiguity, providing highly discriminative cues for contact-rich manipulation.

Abstract

Visuo-tactile policies leveraging optical tactile sensors have shown great promise in contact-rich manipulation. These sensors achieve high spatial resolution and multi-dimensional force sensing by using an internal camera to monitor the deformation of their elastic gel surface, thereby indirectly inferring tactile cues. Despite their advantages, extracting the fine-grained contact states necessary for contact-rich manipulation remains an open challenge. Existing methods typically use either raw images or cumulative motion fields, both of which are prone to perception ambiguity: raw images mainly capture appearance changes, while cumulative motion only reflects aggregate gel deformation. As a result, distinct contact states can exhibit highly similar patterns.

To address this, we explore the dynamic priors of tactile motion and discover that the correlation between transient and cumulative motion can explicitly distinguish fine-grained contact states. Based on this insight, we propose a motion-aware tactile representation, Tactile Motion Correlation (TMC). Beyond representation, effective fusion of tactile and visual modalities is also critical. We take advantage of the Mixture-of-Transformers (MoT) architecture and propose ViTacMotor, a unified yet modality-aware visuo-tactile policy that captures cross-modal complementarity while preserving modality-specific properties. Extensive experiments on four challenging contact-rich manipulation tasks demonstrate the superior performance of our method.

Method

Overview of ViTacMotor. (a) TMC models the correlation between transient and cumulative tactile motion through their dot product to explicitly distinguish fine-grained contact states. (b) A unified yet modality-aware visuo-tactile fusion framework built on the Mixture-of-Transformers architecture captures cross-modal complementarity while preserving the unique properties of each modality.

Why Tactile Motion Correlation Works

Analysis of TMC properties (Daimon sensor). (a) Raw images and cumulative motion are ambiguous across contact states, while the transient–cumulative correlation is highly discriminative. (b) The dot product cleanly separates contact states in 3D distribution. (c) The dot-product magnitude is positively correlated with contact force.

Making contact: transient and cumulative motion align → positive dot product.
Releasing contact: the two motions oppose → negative dot product.
Stable contact / no contact: transient motion vanishes → dot product near zero.
Sliding: clear positive–negative spatial separation reveals the sliding direction.

Hardware Setup & Tasks

Real-world setup. (a) A 6-DoF Agilex arm with a 1-DoF gripper, two cameras (wrist & third-view), and two optical tactile sensors. (b) Two sensors, Daimon and Xense, with distinct markers and gels. (c) Four contact-rich tasks: tube collection, whiteboard erasing, lightbulb insertion, and pencil sharpening.

Video Comparisons

Side-by-side rollouts: Baseline vs. ViTacMotor (Ours) on each contact-rich task.

Task 1: Tube Collection

Baseline

ViTacMotor (Ours)

Task 2: Whiteboard Erasing

Baseline

ViTacMotor (Ours)

Task 3: Lightbulb Insertion

Baseline

ViTacMotor (Ours)

Task 4: Pencil Sharpening

Baseline

ViTacMotor (Ours)

Qualitative Results

Policy execution. ViTacMotor with two tactile sensors (Daimon and Xense) on four contact-rich manipulation tasks: tube collection, whiteboard erasing, lightbulb insertion, and pencil sharpening.

Quantitative Comparison

Success rate (%) over 15 trials. Best per column in bold; numbers in parentheses are training demos.

Tasks Requiring In-Hand State Information
Method	Tube Collection (35)				Lightbulb Insertion (60)
Method	Grasp	1st Hole	2nd Hole	Whole	Align	Insert	Whole
ACT*	66.7	60.0	53.3	53.3	60.0	13.3	13.3
DP*	73.3	53.3	40.0	40.0	53.3	6.7	6.7
ACT + T	86.7	60.0	60.0	60.0	60.0	26.7	26.7
Policy Consensus	80.0	53.3	53.3	53.3	53.3	26.7	26.7
TactileACT	93.3	73.3	60.0	60.0	60.0	40.0	40.0
ViTacMotor	93.3	80.0	73.3	73.3	60.0	40.0	40.0
Tasks Requiring Fine-Grained Force Control
Method	Whiteboard Erasing (50)				Pencil Sharpening (40)
Method	Grasp	1st Erase	2nd Erase	Whole	Sharpen	Holder	Whole
ACT*	100.0	60.0	46.7	46.7	40.0	33.3	33.3
DP*	100.0	66.7	40.0	40.0	33.3	33.3	33.3
ACT + T	100.0	80.0	60.0	60.0	40.0	40.0	40.0
Policy Consensus	93.3	73.3	66.7	66.7	33.3	33.3	33.3
TactileACT	100.0	86.7	73.3	73.3	46.7	46.7	46.7
ViTacMotor	100.0	86.7	86.7	86.7	60.0	60.0	60.0

Ablation Study

Base	TMC	MoT	White. Erase	Tube Coll.
✓			46.7	53.3
✓	✓		73.3	66.7
✓		✓	66.7	60.0
✓	✓	✓	86.7	73.3

TMC in Existing Policies

Tactile Representation	ACT	DP
Raw Image	60.0	53.3
Cumulative Motion	53.3	40.0
TMC (Ours)	73.3	66.7

Robustness

Robustness to environment and object variations. Under temporally varying, spatially (non-)uniform lighting and object changes (different erasers/inks, a cylinder in place of the tube), ViTacMotor still completes the tasks — thanks to TMC's motion-based, appearance-agnostic design.

BibTeX

@inproceedings{vitacmotor2026,
  title     = {Seeing Touch from Motion: A Unified Modality-Aware Visuo-Tactile
               Policy with Tactile Motion Correlation},
  author    = {Xu, Shengqi and Zhong, Guojin and Liu, Yang and Wang, Fanjie and
               Luo, Hu and Zhou, Hanyu and Zhang, Weiyao and Ye, Ziyi and
               Wu, Zuxuan and Jiang, Yugang},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2026}
}

Seeing Touch from Motion

A Unified Modality-Aware Visuo-Tactile Policywith Tactile Motion Correlation

Abstract

Method

Why Tactile Motion Correlation Works

Hardware Setup & Tasks

Video Comparisons

Task 1: Tube Collection

Task 2: Whiteboard Erasing

Task 3: Lightbulb Insertion

Task 4: Pencil Sharpening

Qualitative Results

Quantitative Comparison

Ablation Study

TMC in Existing Policies

Robustness

BibTeX

A Unified Modality-Aware Visuo-Tactile Policy
with Tactile Motion Correlation