May 7, 2009

Facial Animation using 2D Tracking

For the better part of the semester, I have been working with techniques of facial animation using two-dimensional motion capture. While figuring out the primary technical obstacles took only about two weeks, the bulk of the time Shane and I spent on this was geared more towards figuring out how to make Maya Live work for us the most, so not only would we have to spend as little time as possible to get the results we wanted, but we could put the facial performance in the context of the short we set out to make. 

For tracking software, we chose to use MayaLive because, despite the advantages Boujou presented as far as how user-friendly it was and how much it did automatically, Maya Live was ideal for the two-dimensional tracking we sought to do. What MayaLive does is put your video, in the form of an image sequence, onto an image plane. On the first frame, you create tracking points to move onto whatever points you’ve made in the video that you want the program to follow. What you come out with, in the end, is a group of locators that move along the X and Y axis with the tracking points in the video.

Our first attempt at using MayaLive was very rough, but yielded the solutions to our most fundamental problems. I started by sticking pieces of Scotch tape, which I had darkened with permanent marker, to parts of my face I felt would be important to animate for a full facial performance, such as the eyebrow, lips, cheeks, and eyelids. I shot myself on a relatively low-quality DV camera with no light source other than my window and the light in my room. I said the line “just one last thing, and then it’s done” several times, and also made some exaggerated faces for the purpose of testing the limits of the software. The limits of the software ended up being tested elsewhere. The quality of both the tracking points I put on my face and the lighting I used proved to be too low to be entirely cooperative with the software once tracking began. Because I used Scotch tape on my face, their reflectivity made them appear white at points, which caused the tracker to fail to recognize them and lose them altogether. Because the lighting was poor and the marks I put on my face were dark, many of them, particularly those on my lower lip and under my eyes, would get lost in the shadows on my face. It became clear that recording footage for motion tracking was not just a matter of putting the points in the right place and getting a good performance, but making it so the points are clearly distinguishable no matter what position your face is in.

These shortcomings didn’t make tracking points impossible, but just more time-consuming. Instead of being a matter of placing the points on Frame 1 and clicking “Start Track,” tracking became a routine of finding where Maya lost the tracking point, manually placing the point where we knew it should be, and continuing to track it until it was lost again. Repeat. We soon realized that we could make this process at least a bit faster by setting the tracking to “Bidirectional” instead of just “Forward.” Ultimately, 80 percent of the time we spent tracking footage was filling in the holes Maya left open in order to compensate for the failures of my source footage.

Once we had the points completely tracked, it became apparent that we couldn’t directly use the locators Maya gave us for the facial animation, because parenting them to polygons or joints wouldn’t influence their movement. This was because the locators were keyed not through a translation channel, but a “location” channel. So, what we had to do was create as many polygon primitives as there were locators and use the connection editor to connect each locator’s “Location X” and “Location Y” channels to the “Translation X” and “Translation Y” channels, respectively, of a polygon primitive. One mistake we made in our first attempt at this was placing the polygons in approximation with where the locators were. While this still resulted in the polygons moving, it was far from anything similar to the movement of the locators. It turned out that the polygons needed to start out at 0 in order for the connection to truly work, so this meant having them start out at the origin, and making the connection would automatically put them in the right place.

To turn these floating spheres into a facial animation, we placed as many joints on a rough model Woody face, in about the same place on his face as they were on mine. The joints were then each made children of their respective polygon, and smooth bound to the face. Despite the fact that no additional weight painting was done, the result was an animation that resembled my own facial performance more than we expected.

Knowing what we now knew, the next step was to figure out the best places to put tracking points on my face in order to get the kind of facial performance Woody gave in the Toy Story movies, and the best method of putting those points on my face, to say nothing of the lighting.

I was able to improve the lighting, and thus the visibility of the points, simply by repositioning my face in relation to my light sources and boosting the exposure on the camera I was using. If I needed to, I also adjusted the brightness and contrast in Final Cut.

The two main problems with the Scotch tape were that they were shiny and they were irregularly shaped and sized; they were often too big to be useful tracks. So from then on, I used a Sharpie pen to put dots on my face. Once Shane made a “map” of where the tracking points should go on the Woody face, I was able to put dots on my face to correspond with them.

This was the first of what became 3 retakes of facial footage, each time trying to solve a different problem. The changes, as well as new problems presented, can be summarized thusly:

  1. Though the lighting was better and the dots were more evenly sized and shaped, I put too many extra dots on my face and some of them appeared to mash together, and as a result they sometimes confused the Maya tracker. Also, some dots were still too large or too small, resulting in the same problem. Because I used a black Sharpie, the problem of points getting lost in the shadows on my face was still present, though there were fewer shadows thanks to the improvement in lighting.
  2. At Shane’s suggestion, I used a red marker this time to make the dots more visible. However, because the lighting and exposure was substantially better in this take than in previous ones, the need for non-black points was not nearly as great, and ironically some points were lost, now because their color became hard to distinguish from my face instead of the shadows. At this point, it had become increasingly apparent that we needed a “fixed” point on my head to track that only moved with my whole head, without influence from the movement of my eyes or mouth.
  3. I returned to using a black Sharpie, and in marking my face tried to make the dots as “medium” as possible, but err on the small side. My primary goals in marking myself were to keep from having any dots too close together and making sure the placement was as symmetrical as possible.  My solution for a “control” point was to take my headphones and tie 4 aluminum armature wires around the band, having the last 2 or 3 inches stand straight up, ending in a small loop. I then put at the tips of each wire a small ball of red clay to give the ends distinct points that could be tracked.  While this last take was far from perfect (mainly the problem was that some lower lip dots got obscured), this was the footage we wound up using for the remainder of the semester.

Now that we had pretty much worked out all of the problems we faced with the process of tracking points and getting a CG face to animate the way we want, the final challenge was to have the face animate both on the full body rig and in the context of Woody’s body animation, mainly Woody lifting his head, and the body animation that follows his line.

The problem I faced regarding allowing Woody’s face to animate along with the rest of his body was that in its current state, if I were to move the head at all the joints would stay in place, and, skin-bound, keep parts of the head with them, stretching the mesh in undesirable ways. Before I touched the full body rig, I tried this with just the head. What I did first was put all of the joints into a group, and set the pivot point to the same place as the head pivot. I then parent constrained the translation of each of the joints to their respective polygon, and I parent constrained the rotation of the group to the head control. This allowed me to animate Woody’s head, and the joints would not only stay firmly on his face, but they animated his face the same way with his head facing down as with any other direction.

            The last problem was that the full body rig already had a skin cluster, so binding more joints to the face was impossible. The solution was a simple one: instead of smooth binding the joints to the face, I made them influence objects. This way, they could behave the same way they always did, except they didn’t interfere with the pre-existing skin cluster. Other than this difference, the rest of the procedure was the same. 

All considered, this is without a doubt a much faster method of facial animation and lip synch than the traditional keyframe-based method. Creating the actual performance is done more or less in real time, and making the dots traceable is just a matter of placement and contrast. Once one understands what they are doing, the most time-consuming parts of the in-computer process are more tedious and repetitive than anything else. While this technique is obviously limited to the expressive abilities of the real human face, the amount of time it takes to get that level of performative quality and nuance this way versus the traditional way is incomparable.


  1. hey thanks fr dis cool article... i am also working on the similar technique for facial animation for my dissertation.. dis article will be more nice if u insert images while explaining...once thnx alot for this article

    Sudhir Verma

  2. Hi Stephen,

    I just wanted to say thank you for sharing your work. I'm doing the same thing looking to link maya live tracking data to drive the facial animation.

    I was playing with the linking of the tracking nodes and was limitly successful. You information on linking the location to the translation helps out a lot.

    One thing i did discover doing tracking data on the projects is that it helps to have different color tracking dots. I usually see all white tracking balls, but did come across some set ups where the tracking balls were different colors. This helps a lot when dealing in areas where the tracking points cross or come close to each other. The Software can instantly switch if the two tracking points are of the same color and close together.

    I solved this by going and buying a package of stickers used to highlight different things. Simply small circles about 1/4 inch in diameter that come in 4 different colors. Red,Blue,Green and Yellow. I also took a black marker like you to a sheet of white sticker paper and colored half then used a hole punch a bunch of different small circular stickers of the same size as the colored ones. It allowed me to put more tracking points together closer without confusing the software on which to track.

    Thanks again for the post. Really helpful. If at all i'd love to see some of your test footage if available on youtube or something.