Turns out, the reason why the Mediapipe AI model could detect hands properly is because I chose wrong model for the recognition! I was using a holistic model for detecting the hand landmarks, but the holistic model detects hands only if the bodies are visible. We need a model that just detects the hands, and luckily there's a model dedicated to hand detection. Everything worked well when I changed the model to it!
Check out this commit related to this journal