Making My Toddler’s Dream of Flying Come True with AI Tech



  • 3D Game Engine — to bring the 3D fantasy world with hills, mountains, blue skies and Griffin to life using a flight simulator written in OpenGL.
  • Body Posture Estimation — to constantly detect the player’s body posture as an input to the system to control Griffin using the OpenPose Pose Estimation Model and an SSD Object Detection Model.
  • Action Mapping and Gesture Recognition — to transform the body posture into a meaningful action and gesture, such as lifting the left/right wing, rolling the body side to side, jumping to take off, etc.
  • Communication System — to send the gesture input into the 3D game engine using a socket. I will discuss later why we need this.
  • NVIDIA Jetson AGX Xavier — a small GPU-powered, embedded device that will run all the modules above. This is the perfect device for the job because it can support video and audio output via a simple HDMI port, and it has an ethernet port for easy internet access. You can even plug in a mouse and a keyboard to develop and debug right on the device, as it has a fully functional Ubuntu 18.04 OS.
  • TV (with an HDMI input and a built-in speaker) — as a display to the game engine.
  • Camera — Sony IMX327. It’s an awesome, tiny, Full HD ultra-low light camera. Honestly, I could have gone for a much lower end camera, as I only need 224x224 image resolution. However, since I already had this camera for another project, then why not use it?
  • Blu-Tack — to glue everything together and to make sure everything stays in place. :)


Building the 3D Game Engine

  • I replaced the keypress-based flight control system with a target-based system. This way, I could constantly set a roll target angle for Griffin’s body to slowly roll into. This roll target later will be automatically set by a gesture recognition module by mapping the orientation of Dexie’s arms.
  • I enhanced the static 3D model management to support a hierarchical structure. The original airplane model moved as one rigid body and no moving body parts. However, Griffin has two wings that need to be able to move independently of his body. For this, I added the two wings as separate 3D models attached to the body. I can still rotate each wing independently, but I can also move Griffin’s body that will indirectly move the two wings accordingly. A proper way to do this is to build a skeletal animation system and organise the body parts into a tree structure. However, since I only have three body parts to deal with (body and two wings), hacking them up does the trick. To edit the eagle and tree 3D models, I used Blender, a free and easy-to-use 3D authoring tool.
  • I added a tree model for Griffin to take off from and a game state that enables restarting the game without restarting the application. There are two states for Griffin: standing, which is where Griffin is standing on a tree branch and flying.
  • I added sound playback using libSFML: a screaming eagle and a looping wind sound that starts playing as soon as Griffin takes off.

Building the Body Posture Estimation

from jetcam.csi_camera import CSICameracamera = CSICamera(width=224, height=224, capture_width=224, capture_height=224, capture_fps=30)
image =

Building an Object Detection Model with Amazon SageMaker JumpStart

Building the Action Mapping and Gesture Recognition

  • Body roll while flying — to control the forward direction where Griffin is flying. Body roll is calculated from the angle between the horizontal axis and the right-to-left elbow vector (top photo). When flying, both wings are moving in sync using this roll angle. The elbow is chosen as opposed to the wrist to maximise visibility, as the wrist often moves outside the camera’s view or gets obstructed by other body parts.
  • Wing rotation while standing — is purely a cosmetic action that serves no other purpose than to make the game more fun and give more impression of control over each wing independently while standing. It is calculated from the angle between the horizontal axis and the shoulder-to-elbow vector (bottom photo) for each wing respectively. 15 degrees are added to the final roll angle to exaggerate the wing movement as it is quite tiring to lift your arms up high for a period of time.
  • Crouching is another cosmetic action to give an impression of control over Griffin’s crouching posture before the flight take off. It is calculated from the ratio between the length of the neck-to-nose vector and the shoulder vector. The further you crouch, the smaller the distance between your neck and nose whilst the distance between your left and right shoulders remains the same, which then yields a smaller ratio value. You probably wonder why I don’t simply use the neck vertical coordinate directly as a crouch offset. It is never a good idea to use the raw coordinates directly as the magnitude of the neck vertical movement depends on the distance between the person and the camera. We don’t want to wrongly trigger crouching animation when the person is moving closer to and further from the camera. Crouching animation is simply implemented by moving Griffin downwards. Ideally, I could make both legs bend; however, this would require a lot more work for little added value. I would need to detach the legs as body parts and animate them separately as I did with the wings.
  • Taking off gesture — is recognised when the centre point between the left and right shoulders moves up and down greater than a threshold within less than a second. The threshold is chosen to be the length between the shoulders. As the name implies, Griffin will jump off the tree branch and start flying when this gesture is triggered.
  • Game reset gesture — is recognised when the horizontal position of the left and right shoulders is inverted. E.g., the actor is back facing the camera. The game will reset, and Griffin will be back standing on the tree, ready for the next flight, when this gesture is triggered.

Communication System

class Payload(Structure):
_fields_ = [(“roll_target”, c_int32),
(“lwing_target”, c_int32),
("rwing_target", c_int32),
("body_height", c_int32),
("game_state", c_int32)]
typedef struct payload_t 
int32_t roll_target;
int32_t lwing_target;
int32_t rwing_target;
int32_t body_height;
int32_t game_state;
} payload;

Calibration and Testing



  • Torch2trt is an awesome tool to automatically convert a PyTorch model into TensorRT to optimise your AI model running in Jetson AGX Xavier. Many cutting-edge AI models are built in PyTorch, and porting them manually to TensorFlow is a pain in the bum.
  • NVIDIA Jetson AGX Xavier is a real beast! Many people have said that you can run a computer vision model processing 30 live 1080p video streams simultaneously. Now, I really have no doubt in this.
  • Amazon SageMaker JumpStart offers you a large collection of popular AI models at your disposal by making it super easy to deploy them.
  • Building the 3D game engine took me back in time to my past life as a game and movie SFX developer and allowed me to refresh my rusty skills in OpenGL, C++ and trigonometry.
  • I could have built Griffin in Xbox using a Unity Engine and a Kinect sensor. However, what’s the fun in that? Sometimes building from scratch is where the fun is.
  • Being an eagle is quite a tiring job, especially lifting your arms for a period of time. However, a real eagle might get a lot of help from the air drag to maintain their wingspread throughout flight.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store