YOLO (You Only Look Once)

As mentioned in my previous post, face detection is the main bottleneck in my auto-face alignment program.

Cascade classification sucks when lighting and image quality suck. I had always wanted my final program to work without lighting constraints/making a perfect environment. You should be able to stick the camera (just about) anywhere and have it work, and not just the room that I am programming this device in. Cascade classifiers are very fast and useful for other applications, but for times when lighting and image quality sucks, other methods need to be used. An alternative method to Cascade classifiers for face detection is convolutional neural networks. The problem with conv. nets is they are usually computationally expensive. However, I believe and convolutional network architecture, known as You Only Look Once (YOLO), may fit the bill for speed and accuracy. YOLO does have short comings, including being spotty with recognizing smaller objects, however in this application, the program will be recognizing nearby faces, which should not be small.

I have downloaded the necessary data files and example code, and have walked myself through the YOLOv3 tutorial, and I recommend you do the same. Additionally, I have run the OpenCV4 example code for interfacing with YOLO networks, and I have run the example for that code as well. Both examples have been successful. Interestingly, the command line utility (darknet) to run YOLOv3 was much slower than the OpenCV example that loaded in the same network and the same weights on the same images. The darknet utility took around 2 minutes to run, while OpenCV took around 10 seconds, which is promising as I will be interfacing with the YOLO network through OpenCV.

The demo test I did using the YOLOv3 network. The “bottle” is actually a can of WD-40, but I think it looks like a bottle too.

Thanks to the promising results so far with the YOLO network, I will continue development of facial detection using the YOLO network. I hope to train a much smaller network with 1 class of labels (the face label) in order to speed up detection time.

Automatic Facial Alignment with Open CV

As mentioned in my last post, one of the first steps in facial identification is preparing a dataset and ensuring that all input images/training data are aligned the same. This could be done manually, but it would be a very time consuming task and is unable to be done in real time. Because of this I have done some research on the Open CV Facemark Active Appearance Model (AAM) API and have created a program that automatically aligns faces it detects. The tool works in the following steps:

  1. Start up and command line parsing, loading Facemark LBF model, loading Cascade Classifier model, general initialization and error handling
  2. Grab a frame from the input source
  3. Run the Cascade Classifier on the input frame to detect where faces are
  4. Run the Facemark AAM on the regions that the Cascade Classifier returns
  5. Select the points that outline each eye
  6. Average the points of the left eye together, average the points of the right eye together
  7. Average all of the points that outline the face
  8. Create a Affine Transform using the average positions of both eyes and the average face points for the source triangle and a preset triangle for the destination triangle
  9. Apply the Affine Transform
  10. Display/output the rotated, scaled, and cropped faces!

This is a very general step by step process, and I will elaborate a bit more on each step.


The 3rd step, when the Cascade Classifier is run on the input frame, is needed to find the areas where faces are. This is the main downfall of the program. If this step fails to detect faces, then the faces will never be marked by the Facemark AAM and be passed over. Also, if this step falsely identifies objects as faces, then the false faces will be marked by the Facemark AAM and can lead to false data. Sadly, this step is also the most inaccurate. There are plenty of false positives and sometimes will recognize the false positives more than my own face. The only way to improve this program would be to improve this step. Possible alternatives to a Cascade Classifier include CNNs and deep neural networks.

The 4th step, when the Facemark AAM is run on the detected regions, is when Open CV places points outlining the contour of the face and points surrounding the left eye and the right eye. The next step where the points outlining the eyes are selected is done by grabbing the FacemarkLBF parameters through FacemarkLBF::Params() and accessing the pupils array. FacemarkLBF::Params.pupils[0] holds the indexes of points for the left eye, and FacemarkLBF::Params.pupils[1] holds the indexes of points for the right eye. By grabbing these indexes, the points for the left and right eyes can be selected and averaged. This gives us two points, which will be the center of the left eye and the center or the right eye. We later use these points to align the face with the Affine Transform. The third point used in the Affine Transform is the average of all of the points made by the Facemark API. This includes the points surrounding the face and the points surrounding the eyes. Averaging them together gives a point that is usually around/on the person’s nose, which is a perfect “midpoint” of the person’s face, and it is why it is used as the third point in the triangle for the Affine Transform.

An Affine Transform takes two triangles: a source triangle and a destination triangle. Using some fancy math that I don’t understand right now, it calculates the rotations, resizing, and translations needed to turn the source triangle into the destination triangle. These rotations, resizies and translations are stored in a “warp matrix” and this warp matrix is applied to the source image in order to align it. The destination triangle is the same across all images in order for all output images to be aligned the same. Google Affine Transformation and you will probably find much better explanations for it than I have to offer.

Kinetic Sand Art: Decisions… Decisions… Decisions…

This past week Anthony and I have been at somewhat of a standstill. We have been trying to decide whether or not to buy a specific part that we found pre-made for our project. We made a power point on this problem and presented it to the class. Today we decided that we would find the parts online and build it in class.

Presentation: https://docs.google.com/presentation/d/1aGyVe19d4ezW4J1zq53dXzdaGQytKflFxPTR7Q4ivKc/edit#slide=id.g4cda35ee10_0_56

While we have been attempting to figure things out from a financial stand point, I have been working on the math for calculating the speed at which our steppers need to spin. We want a rate of approximately 1 cm/s for our ball, so I have been using some physics to figure out what angular speed the ball needs to be moving at in order to get the appropriate speed at each radius. I am still working on this problem, but I should expect some results within the next week.

Stepper Motor Progress!

Me and Anthony have made significant progress in our knowledge of arduino coding. We have hooked up two different stepper motors and programmed them to do different things on the same arduino. This is the core to our project. Now we plan on working on planning out and coding for a “reset” point so our stepper knows where it is.

I hope to soon get working on the hardware for a prototype so we can put our coding to good use. With the recent cancellation of school it has slowed our progress, but we are on our way to success!

Kinetic Sand Art : 2/13

On the previous week, Denver and I figured out how to operate a stepper motor and even make two stepper motors spin at the same time. We also figured out how to make the stepper motor “microstep” for precise movement.

For this week, we will be focusing more on the hardware aspects of the project and continue to develop our understanding of the stepper motor as it is a key component that drives the kinetic sand art project.

Motor Control

This past week I’ve spent learning how to control motors with Arduino. I completed the DC motor lab, finally figuring out how to make a motor’s speed increase incrementally with a potentiometer as well as toggle the direction of the motor while being able to turn the circuit on and off. I learned some new functions in the coding such as analogWrite, which unlike digitalWrite has more then 2 states and can apply partial voltage to a dedicated pin instead of being completely active. Another useful function was the serial.println, this allowed me to be able to troubleshoot my circuit by monitoring the state of certain components I was minupulating. The only other task i have left to do before i move on is the servo motor control, then I will be able to start on the wind chime project.

“Fishing” for Facial Recognition

I have started progress on a program to recognize and identify faces in a live video stream that has knowledge of names and training data and can output the name of the face it recognizes in the live stream. There are a few methods available for facial recognition, including CNNs, Fisher Faces, and Eigenfaces. Convolutional Neural Networks (CNNs) have unprecedented accuracy, however they are computationally expensive and I will be unable to run it in real time on a video feed on my laptop. This leaves two other methods: Eigenfaces and Fisher Faces. In today’s standards, both methods are computationally cheap and should be able to run real time on my laptop.

I predict that the Fisher Faces method will be more successful than the Eigenfaces method because it is less sensitive to changes in lighting and focuses more on actual facial features than just shadows and lighting (see this blog post, section “Eigenfaces”, and also this paper). The Fisher Faces detection method is actually very similar to the Eigenfaces method. Both methods require that recognition input and training data consists of a forward facing face that is positioned, resized, and cropped to a uniform size.

I would like to avoid manually cropping, resizing and positioning each photo of training data, and also live video feed recognition will also require automatic input preprocessing (I can’t hire a human to draw boxes around human faces in real time sadly).

Cropping and resizing operations are fairly straight forward. Cropping can be done through a simple HAAR or LBF cascade that selects a bounding box around the detected faces (and these faces can be cropped from the boxes). Resizing operations should be fairly simple too, and require simply resizing an input image into the target output dimensions that should be constant across the data set (all input and training images should be a certain size).

The tricky part comes in with aligning the faces to the image. The training and input data requires the images to be aligned the same, which means that the eyes of a face from Image 1 should be at the same position as the eyes of face from Image 2. They might not be in exactly the same position, however they should be fairly close (within a few pixels). This might be easy to do if you force your subjects to align their face perfectly when collecting the training and input data, however that is not very applicable in the real world. In reality, it would be better for facial recognition to work on faces that are slightly turned or looking off in a direction.

This adds more strain on the facial recognition software as input images have to be automatically aligned, which is a daunting task because not only must a face be detected, but also the eyes and mouth and a general outline of the face. This type of recognition is known as “facial alignment”, “facial features detection” or “facial landmark detection” (source). Once the position and rotation of the eyes is detected, we can use simpler rotation and translation operations to center the face.

Thankfully OpenCV provides tools through their “contrib” modules to do facial landmark detection (the “Facemark” API contains the tools we need). To install the “contrib” modules, you must already have downloaded the OpenCV4 source code and also download the contrib modules source code from here. Please follow the installation instructions that are also listed in the README for the repository.

This concludes the basic theory and decision to utilize Fisher Faces recognition method instead of Eigenfaces as well as the setup needed for the project. I will continue posting updates on this project.

Kinetic Sand Art Project : 2/4

Today, Denver and I will be learning the ins and outs of a stepper motor as it will be a key component in the sand art project. After learning the basics and successfully operating a stepper motor, we will begin research on what GT 2 belt size we would like as our table will be based on the size of the belt. This is only the beginning and we have a lot more to do to see the end result of our project.

Learning and applying Arduino

Over the past two weeks Ive learned how much you can accomplish using microprocessors to conrtol electronic circuits. Being able to program logic into a circuit and have it control and monitor events enables us to accomplish a lot more then we could otherwise. However there is a bit of a learning curve when it comes to coding, finishing the LED traffic light helped me gain a basic understanding of some program functions aswell as its limitations. Im still yet to complete the DC motor project partially due to the messed up schedule from the snow and as well as it takes me a bit longer to fully understand something then it may most people. Im looking forward to see what is yet to come for this class.

Robotics Tutorial Projects

The first few weeks of Robotics were patchy and often shortened due to weather and school scheduling, but we managed to complete the learning projects nonetheless. The goal of these projects was to prepare us for our own personal final project; in our case, the tennis ball cannon. In order, we had to make a flashing traffic stop, attach a crosswalk button to the traffic stop, make a motor that is speed controlled by a potentiometer, then change direction and speed. In the last of the generic class-wide projects, we controlled the position of a servo with a potentiometer. In the near future, we will learn how to use stepper motors and accelerometers.