AI Robots Learn Touch and Vision to Handle Objects Like Humans
AI Robots Learn Touch and Vision to Handle Objects Like Humans

AI Robots Learn Touch and Vision to Handle Objects Like Humans

Summary: A new breakthrough shows how robots can now integrate vision and touch to manipulate objects with greater precision than humans. Researchers have developed TactileAloha, a system that combines visual and tactile information, allowing robotic arms to adapt more flexibly to real-world tasks.

Unlike purely vision-based systems, this approach allowed robots to manipulate complex objects like Velcro and cable ties, demonstrating human-like sensory judgment. The results represent a significant step toward the development of physical AI that could assist robots with everyday tasks like cooking, cleaning, and caring for others.

Important facts:

  • Touch integration: Combines visual and tactile detection for better object handling.
  • Adaptive performance: Only vision systems improve on complex tasks like Velcro manipulation.
  • Daily capacity: Bringing robotics closer to practical use in homes and the workplace.

Source: Tohoku University

In everyday life, picking up a cup of coffee from a table is a piece of cake. We seamlessly integrate multiple sensory inputs, such as sight (seeing how far away the cup is from us) and touch (feeling it make contact with our hand), in real time, without even thinking about it. However, it is not so easy to emulate this with artificial intelligence (AI).

An international group of researchers has developed a new approach that combines visual and tactile information to manipulate robotic arms while simultaneously responding adaptively to the environment. Compared to traditional vision-based methods, the approach achieved a higher success rate on tasks. These promising results represent a significant advance in the field of multimodal physical AI.

Details of their progress were published in the journal IEEE Robotics and Automation Letters on July 2, 2025.

Using machine learning, artificial intelligence (AI) can learn human movement patterns, helping robots autonomously perform everyday tasks like cooking and cleaning.

For example, ALOHA (A Low-Cost, Open-Source Hardware System for Two-Handed Teleoperation) is a system developed by Stanford University that enables versatile, low-cost remote control and education of two-armed robots. Both the hardware and software are open source, allowing the research team to build on this foundation.

However, these systems rely primarily on visual information. Therefore, they lack human-like tactile abilities, such as distinguishing the texture of materials or the front and back of objects. For example, it may be easier to distinguish the front or back of Velcro by touching it rather than judging its appearance. Relying solely on vision without other input is a regrettable weakness.

“To overcome these limitations, we developed a system that also enables operational decisions based on the structure of target objects, which are difficult to judge based on visual information alone,” explains Professor Mitsuhiro Hayashibi of Tohoku University’s Graduate School of Engineering.

This achievement marks a major milestone in the development of multimodal physical AI. It represents progress toward systems that can integrate and interpret information from various sensory inputs.

By combining vision, hearing, and touch, these advanced AI models begin to process the world in a way that closely mirrors human perception. This opens the door to more intuitive and responsive interactions between humans and machines.

By leveraging vision-tactile Transformer technology, the physical AI robot achieved greater flexibility and adaptability in its control. Credit: StackZone Neuro
By leveraging vision-tactile Transformer technology, the physical AI robot achieved greater flexibility and adaptability in its control. Credit: StackZone Neuro

The newly developed system, named “TactileAloha,” represents a breakthrough in physical AI by enabling a robot to perform a variety of complex, coordinated actions using both hands. Researchers observed that the robot was capable of executing tasks that demand a high level of dexterity and control, particularly those involving subtle back-and-forth motions and firm, adaptive grip. Examples of such tasks include manipulating hook-and-loop fasteners—commonly known as Velcro—and handling cable ties, which require careful pressure application and sustained control during the securing process. These tasks are especially challenging for robots due to the need for fine motor skills and the ability to respond in real-time to resistance, texture changes, and unpredictable movements. The fact that TactileAloha could accomplish them with a high degree of precision highlights its advanced motor coordination and potential for real-world utility in environments that demand skilled manipulation.

This advancement was made possible through the integration of Transformer-based vision and tactile sensing technologies, which together enabled the robot to interpret and respond to its surroundings in a much more nuanced way than traditional robotic systems. By combining visual data with tactile feedback, the system could adjust its grip strength, hand positioning, and motion paths dynamically, based on the physical properties of the objects it interacted with. This multimodal approach significantly enhanced the robot’s ability to adapt to different materials, shapes, and resistances in real time, resulting in far more flexible and reliable control. The success of this system underscores the growing importance of sensory integration in robotics and moves the field closer to achieving physical AI that can interact with the world as intuitively and fluidly as humans do. It also opens new possibilities for robotics applications in manufacturing, healthcare, and service industries where delicate and adaptive manipulation is crucial.

The improved physical AI method was able to accurately manipulate objects and combine multiple sensory stimuli to produce adaptive and responsive movements. The practical applications for this type of robot that could help with this are virtually endless.

Research partnerships like TactileAloha bring us one step closer to integrating these robotic helpers into our daily lives.

The research group included members from Tohoku University’s Graduate School of Engineering, the Transformative Garment Production Center, the Hong Kong Science Park, and the University of Hong Kong.

About this AI and robotics research news

Author: Public Relations
Source: Tohoku University
Contact: Public Relations – Tohoku University
Image: The image is credited to StackZone Neuro

Original Research: Open access.
TactileAloha: Learning Bimanual Manipulation with Tactile Sensing” by Mitsuhiro Hayashibe et al. IEEE Robotics and Automation Letters

Abstract

TactileAloha: Learning Bimanual Manipulation with Tactile Sensing

Tactile texture is essential for robotic manipulation, but poses a challenge for camera observation.

To address this issue, we propose an integrated tactile vision robotic system Tactile Aloha based on Aloha, which uses gripper-mounted tactile sensors to capture detailed texture information and supports real-time visualization during teleoperations, enabling efficient data collection and manipulation.

Using data collected by our embedded system, we encode tactile signals with a pre-trained ResNet and combine them with visual and proprioceptive features.

The combined observations are implemented through a transformer-based policy in which future actions are predicted.

During training, we use a weight loss function to emphasize actions in the near future. During execution, we use an improved time aggregation scheme to improve the accuracy of the action.

We experimentally proposed two manual tasks: attaching cable ties and tying Velcro. Both tasks require tactile perception to understand the structure of an object and to align two object orientations with both hands.

Our proposed method systematically changes the generated manipulation sequence based on touch detection.

The results show that our system, which uses tactile information, can perform texture-related tasks that camera-based methods cannot.

Furthermore, our method achieves an average relative improvement of about 11.0 percent compared to the state-of-the-art method with touch input, demonstrating its performance.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *