Technical Issues of Voice Recognition for Interactive Robots
Robots that perform operations instead of humans or support operations by humans are largely categorized into industrial robots and business/service robots. Business/service robots have been growing quickly recently, and it is expected that their world market size will reach 52 billion USD by 2025 (Source: “Reality and Future Outlook of Worldwide Robot Related Market 2018” by Fuji Keizai Co., Ltd.). Among business/service robots, the growth of interactive robots, in particular, is expected.
Interactive robots automatically interact with humans using AI. They can actuate movements, and/or speech in accordance with the interaction, and are expected to play a role in solving global social issues, such as aging societies and labor shortage.
The user interface for interactive robots is provided by voice recognition technology. Interaction robots need to detect the direction of the speaker even if he/she speaks from a distance and recognize his/her speech against background noise. As a solution, multiple microphones are installed to find the direction of the speaker from the differences among the times of arrival and the strengths of the voice received at the respective microphones. This technology is called beamforming. It is also required to remove noise components only from background noises coming from various directions with noise cancelling technology2 using multiple microphones.
To accomplish beamforming and noise cancelling, it is necessary to use multiple microphones that are compact and light as much as possible. On the other hand, to accurately recognize the speaker’s voice, a high S/N ratio3 is required. In general, however, the higher the S/N ratio is, the larger the dynamic range4 becomes. Microphones for interactive robots face the challenge of maintaining a high S/N ratio and suppressing the distortion at the same time as being compact and light.
Voice Recognition Technology Solutions Using Compact MEMS Microphones
TDK’s various MEMS microphones solve the problems of beamforming and noise cancelling. The ICS-40730, which TDK provides under the InvenSense brand, is a compact and high-performance MEMS microphone. 4.72 x 3.76 x 3.50 mm in size, it features a high S/N ratio of 74 dBA, and is suitable for a microphone array5 for beamforming. Even at a sound level of 105 dB, the ICS-40730 can suppress the distortion at maximum 0.6% to maintain a high S/N ratio. (The sound level that is considered noisy is 110 dB, such as the level of the noise under an elevated railway track when a train is passing.) It also significantly contributes to a noise-cancelling function.
In addition to the ICS-40730, TDK provides a lineup of MEMS microphones for various applications. TDK’s microphones are used for smartphones, notebook computers, tablets, wearable devices, and hearing aids as well as voice recognition interfaces incorporated in interactive robots and luxury cars. These microphones include MEMS components, which are acoustic sensors using MEMS technology—TDK’s core technology—and ASICs (Application Specific Integrated Circuit) in a single package.
Robots have evolved to be more human-like. Humanoid robots are equipped with a lot of sensors to control the joints and the posture so that human-like movements can be achieved. In addition to MEMS microphones and various other MEMS sensors, TDK provides a variety of sensors such as angle/position sensors using semiconductor technology, thin film technology, magnetic technology, and temperature/pressure sensors using electronic ceramics technology, together with advanced software to support the development of robotics.
- MEMS is a micromachining technology that creates fine sensors as well as movable mechanisms, etc., on silicon substrates, utilizing similar manufacturing processes as those for semiconductor integrated circuits.
- Noise-cancelling technology removes only noise components by adding antiphase acoustic waves to noise waves. It is adopted in headsets and smartphones, etc.
- The S/N (signal-to-noise) ratio is the ratio of signal power to noise power in a logarithmic scale. The higher the ratio is, the less the noise and the higher the signal quality.
- Dynamic range is the distance between the noise floor and where non-linear distortion start. This defines the usable range in SPL of the microphone.
- An array is made up of multiple parts, mainly for densification.
Product inquiries by email