top of page
Writer's pictureVinod Barela

Embrace the Future of Interaction with MediaPipe Hand Gesture Recognition

Hello, fellow tech enthusiasts! Today, we embark on a thrilling journey into the captivating realm of hand gesture recognition using the remarkable power of MediaPipe. As our digital world becomes increasingly immersive, human-computer interaction is evolving rapidly. Hand gestures have emerged as a natural and intuitive way to communicate with our devices and virtual environments. With MediaPipe's cutting-edge pose estimation module, we are about to unlock the true potential of hand gesture recognition. Join me on this exciting adventure as we explore how MediaPipe empowers developers to create seamless, accurate, and real-time hand gesture interactions.




What is MediaPipe?

Google created a framework for adaptable machine learning solutions called MediaPipe. It is a lightweight, open-source, and cross-platform framework. Pre-trained ML solutions for face detection, position estimation, hand recognition, object detection, and other tasks are included with MediaPipe.


To start, we'll utilise MediaPipe to identify the hand and its major features. For each hand that is spotted, MediaPipe returns a total of 21 critical points.



To identify the hand stance, these crucial details will be sent into a network of pre-trained gesture recognizers.


How to complete the project:



1. Install and Import Dependencies:

!pip install mediapipe opencv-python
!pip install tensorflow

2. Import necessary packages:

We will need four packages in order to develop this hand gesture recognition project. So import these first.

import mediapipe as mp
import cv2
import numpy as np
import uuid
import os

3. Initialize MediaPipe:

mp_drawing = mp.solutions.drawing_utils
mp_hands = mp.solutions.hands
  • The hand recognition algorithm is carried out by the Mp.solution.hands module. The item is thus created and stored in mp_Hands.

  • The essential points will be automatically drawn using Mp.solutions.drawing_utils, saving us the time of manually drawing them.

4. Detect hand keypoints:

Using the potent opencv-python and mediapipe libraries, we dig into the fascinating area of real-time hand tracking in this engaging Python script. We may access the default webcam (index 0) by using the cv2.VideoCapture(0) function, which gives us access to an enchanted trip of seeing and locating human hands in the live video stream. The mediapipe library unveils the enchanted hand landmarks and connections as we process each video frame, turning the routine footage into a remarkable demonstration of hand movements. As the story progresses and the fascinating potential of hand tracking technology are displayed, get ready for a thrilling visual experience.



cap = cv2.VideoCapture(0)

with mp_hands.Hands(min_detection_confidence=0.8, min_tracking_confidence=0.5) as hands: 
    while cap.isOpened():
        ret, frame = cap.read()
        
        # BGR 2 RGB
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        
        # Flip on horizontal
        image = cv2.flip(image, 1)
        
        # Set flag
        image.flags.writeable = False
        
        # Detections
        results = hands.process(image)
        
        # Set flag to true
        image.flags.writeable = True
        
        # RGB 2 BGR
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
        
        # Detections
        print(results)
        
        # Rendering results
        if results.multi_hand_landmarks:
            for num, hand in enumerate(results.multi_hand_landmarks):
                mp_drawing.draw_landmarks(image, hand, mp_hands.HAND_CONNECTIONS, 
                                        mp_drawing.DrawingSpec(color=(121, 22, 76), thickness=2, circle_radius=4),
                                        mp_drawing.DrawingSpec(color=(250, 44, 250), thickness=2, circle_radius=2),
                                         )
            
        
        cv2.imshow('Hand Tracking', image)

        if cv2.waitKey(10) & 0xFF == ord('q'):
            break

cap.release()
cv2.destroyAllWindows()

Let's step-by-step through the code:

  • cap = cv2.VideoCapture(0): This line initializes a video capture object using the default webcam (index 0).

  • with mp_hands.Hands(min_detection_confidence=0.8, min_tracking_confidence=0.5) as hands: This sets up the MediaPipe Hands module for hand detection and tracking. It specifies the minimum confidence values required for detection and tracking. If the confidence levels are set too low, it might lead to less accurate results.

  • The while loop starts to capture video frames from the webcam as long as the webcam is opened and valid.

  • ret, frame = cap.read(): The ret boolean value in this line, which reads one frame from the camera, indicates if the frame was successfully read.

  • image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB): The captured frame is converted from the BGR color space (used by OpenCV) to the RGB color space (used by MediaPipe).

  • image = cv2.flip(image, 1): The image is flipped horizontally. This is often done when working with webcams because the default webcam feed is often mirrored.

  • image.flags.writeable = False: This line sets the "writeable" flag of the NumPy array to False. This step is taken to optimize the data for the hands.process() function, which might improve performance.

  • results = hands.process(image): The MediaPipe Hands module processes the input image to detect and track hands. The results are stored in the results variable.

  • image.flags.writeable = True: The "writeable" flag of the NumPy array is set back to True to modify the image for rendering the landmarks.

  • image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR): The image is converted back to the BGR color space to be compatible with OpenCV functions.

  • print(results): The results variable contains the hand detection and tracking information, which is printed to the console.

  • If there are any detected hands (i.e., results.multi_hand_landmarks is not empty), the script loops through each hand and draws the hand landmarks and connections on the image.

  • cv2.imshow('Hand Tracking', image): The processed frame with hand landmarks is displayed in a window titled "Hand Tracking."

  • The script checks for the "q" key press to exit the loop and terminate the program.

  • After the loop ends, the webcam is released (cap.release()) and all OpenCV windows are closed (cv2.destroyAllWindows()).

With the help of this script, you can watch the webcam's real-time hand tracking output, with hand landmarks and connections displayed on the video feed. Building hand-based apps such as gesture detection, sign language interpretation, or virtual hand interactions can be started off with this.


Results:








91 views0 comments

Kommentare


bottom of page