Speech-to-Text in React.js

In today’s fast-paced digital world, ease of access and user experience are paramount. One innovative way to enhance user interaction with web applications is by integrating speech-to-text functionality. Recently, I implemented this feature in a search bar using the react-speech-recognition library within a Next.js framework. The results have been remarkable, opening the door to limitless applications. In this article, I will walk through the process of this integration and highlight its benefits.

Introduction to Speech-to-Text:

Speech-to-text technology converts spoken language into written text. This technology has become increasingly popular due to its applications in virtual assistants, transcription services, and now, web interfaces. By allowing users to search via voice commands, we can improve accessibility and streamline user interactions.

Why React-Speech-Recognition?

“react-speech-recognition” is a lightweight library that provides a simple API for integrating speech recognition into React applications. It leverages the Web Speech API, making it a robust choice for adding speech-to-text functionality to modern web applications.

Benefits of Speech-to-Text Integration:

Accessibility: Enhances accessibility for users with disabilities, making your application more inclusive.
Convenience: Provides a hands-free search option, which is particularly useful on mobile devices.
Efficiency: Speeds up the search process, especially for users who are on the go or prefer voice commands.

Challenges Faced:

Implementing this feature was not without its challenges. One major issue I encountered was the need to install the regenerator-runtime library. Without it, the speech-to-text functionality was not providing the expected results. Furthermore, I had to dynamically import this library to avoid build errors. This dynamic import ensures that the regenerator-runtime is only included when necessary, preventing it from causing issues during the build process.

Potential Applications:

The integration of speech-to-text in a search bar is just the beginning. This technology can be extended to other areas of your application, such as:

Voice Commands: Allow users to navigate your application or trigger specific actions using voice commands.
Form Inputs: Enable speech-to-text for form inputs to improve data entry speed and accuracy.
Customer Support: Implement speech-to-text in chatbots or virtual assistants to provide a more interactive support experience.

Conclusion:

Integrating speech-to-text functionality using react-speech-recognition ina Next.js project can significantly enhance the user experience. By following the steps outlined above, you can add this innovative feature to your application and explore its limitless potential. Whether it’s improving accessibility, providing convenience, or driving efficiency, speech-to-text is a powerful tool that can transform how users interact with your web application.

Repo Link: https://github.com/RahulSM2002/SpeechRecognition

Web Link: https://speech-recognition-zeta.vercel.app

Code Example:


import { IconButton } from "@mui/material";
import Image from "next/image";
import "./style.css";
import React, { useEffect, useRef, useState } from "react";
import SpeechRecognition, {
  useSpeechRecognition,
} from "react-speech-recognition";
import "regenerator-runtime";
const SpeechToTextField = ({
  setText,
  setIsRecording,
}: {
  setText: any;
  setIsRecording: any;
}) => {
  const {
    transcript,
    listening,
    resetTranscript,
    browserSupportsSpeechRecognition,
  } = useSpeechRecognition();
  const [volume, setVolume] = useState(0);
  const audioContextRef = useRef(null);
  const analyserRef = useRef(null);
  const dataArrayRef = useRef(null);
  const sourceRef = useRef(null);
  const [lastTranscript, setLastTranscript] = useState("");
  const timeoutRef = useRef(null);
  //This will stop the recording after 2 seconds if there's no new words
  useEffect(() => {
    setText(transcript);
    if (transcript !== lastTranscript) {
      setLastTranscript(transcript);
      if (timeoutRef.current) {
        clearTimeout(timeoutRef.current);
      }
      timeoutRef.current = setTimeout(() => {
        SpeechRecognition.stopListening();
        stopAnalyzingAudio();
      }, 3000);
    }
  }, [transcript]);
  useEffect(() => {
    setIsRecording(listening);
  }, [listening]);
  useEffect(() => {
    return () => {
      if (timeoutRef.current) {
        clearTimeout(timeoutRef.current);
      }
    };
  }, []);
  const analyze = () => {
    if (analyserRef?.current && dataArrayRef?.current) {
    analyserRef?.current?. getByteFrequencyData(dataArrayRef?.current);
      const sum = dataArrayRef?.current?.reduce((a, b) => a + b, 0);
      const avg = sum / dataArrayRef?.current?.length;
      setVolume(avg);
      requestAnimationFrame(analyze);
    }
  };
  const stopAnalyzingAudio = () => {
    if (audioContextRef.current) {
      audioContextRef.current.close();
    }
    setVolume(0);
  };
  useEffect(() => {
    if (listening) {
      startAnalyzingAudio();
    } else {
      stopAnalyzingAudio();
    }
  }, [listening]);
  if (!browserSupportsSpeechRecognition) {
    return Browser doesn't support speech recognition.;
  }
  const OnPress = () => {
    if (listening) {
      SpeechRecognition.stopListening();
    } else {
      resetTranscript();
      SpeechRecognition.startListening();
    }
  };
  return (
      (Call the above functions to use the Speech to text recognition feature.)
  );
};
export default SpeechToTextField;