Sheldon

Sheldon, my team’s project, won 1st place at DataHack 2018. Here is how we conceived and built it. If you are unfamiliar with it, DataHack is the biggest data hackathon in Israel. It takes place once a year, in Jerusalem.

The Vision

When we thought about how Artificial Social Intelligence could be useful in the real world, we quickly realized that it could help people with Asperger’s syndrome to interpret social nuances. The thought that in just 39 hours we could develop an MVP that could improve someone’s life really motivated us!

Why Sheldon you ask?

Sheldon, a character from The Big Bang Theory, often misses social nuance. So for us he was a comical representation of some of the very serious challenges faces by our intended end-users. After the hackathon, we came across a clip of Sheldon with a product which identifies emotions using AI (his product is surprisingly similar to ours).

If you think about it, Sheldon is just the tip of the iceberg. Full social intelligence would be beneficial in countless other ways.

So who are we?

Team Sheldon

The First Steps

  1. Face Detection
  2. Facial Emotion Recognition
  3. Identity Detection
  4. High-Level Social Pattern Recognition
  5. Vocal Emotion Recognition

From the outset it was clear to us that we would use a mobile app to showcase our algorithms and models. For this reason our experienced and skilled full stack developers were critical to the project’s success. The experience of a joint programmer and data scientist team was new to all of us.

Data Collection

Next, we searched for facial emotion recognition datasets. While we found some results, none quite suited our need. We wanted the raw data so that we could filter the data, expand it or train our model to recognize new emotions.

We ended up using data from a few sources:

Datasets

  • KDEF: ~5000 pictures of faces, same faces from different angles

Scraping

Onsite manual collection

Data Processing

Next, we applied a grayscale transform to all of the images and resized them to 48x48 pixels, to match the pictures in fer2013.

We trained our models using four basic emotions: Happy, Sad, Angry and Neutral. These emotions were present in both academic data-sets.

The scraped data significantly reduced our model’s accuracy so it wasn’t used in our final model. Some manual analysis (at about 02:30 AM…) revealed that many of the scraped images were not representative of the emotions we were training the model to recognize. For example the search results for “sad man” after about twenty pictures looked like this:

Search Results for “sad man”

Emotion Recognition Model

During training we augmented the data using rotations of up to 10 degrees, horizontal flipping and ±10% zoom. We used a cross-entropy loss, reduced the learning rate when the loss didn’t improve for more than ten epochs and employed early stopping.

Measuring Accuracy

Our final model achieved 75% accuracy and about 0.7 cross-entropy on the validation data. Here are some confusion matrices for the geekier readers:

We noticed that the sad and angry classes were significantly harder for our model to recognize than the happy and neutral classes. This is most pronounced in the test results. We discovered that often we disagreed with the labels assigned to the images representing these classes. We suspect that this is due to subtle physical differences between angry and sad expressions (especially in a single image). Another explanation is that, although the data scientists we photographed at DataHack are very talented individuals, acting is not quite their forte.

Building the App

Faces were identified on the phone and a rough crop of the faces was sent to the server. First, the server ran a more fine-grained face cropping to get a tight crop of the face. The server then attempted to identify the face using eigenfaces. Next, the server ran our custom emotion recognition on each crop. The identifications and emotions were then sent back to the client to be shown on the screen.

Transitioning from Micro to Macro

  • What do we want to save?
  • At what resolution?
  • How to save/index it?

We decided to save aggregated data from each “appearance” of a face (i.e. from the time a face was identified until it doesn’t appear in the frame for a few seconds).

Some examples of aggregate data are “most common emotion”, “first emotion” and “last emotion”. Average emotion would be the key building block for generating an “emotion distribution” for each face the app recognized. This would enable the app to say “Person x is angry, but he is angry 75% of the time”.

High-Level Social Pattern Recognition

The app was able to identify the “Cheering person x up” pattern by detecting a person with a neutral expression for some time and then detecting the same person with a happy face later.

Another pattern the app was able to detect was the “Insulting person x” pattern by detecting a person with a neutral expression and then an angry expression.

Challenges

During development, we found some misidentified emotions were bothersome. Therefore, to improve the user experience, we tweaked the app to only predict an emotion if one of the classes was assigned a much higher probability than the others, indicating that the model was confident in the classification. If no emotion was predicted, the assigned emotion was “Neutral”.

By the end of the hackathon we had a working app that was visually pleasing, worked in real-time and most importantly provided value!

Lessons learned

Sleep is overrated. On the second night of the hackathon our team slept less than 6 hours (all four of us combined!!!).

Data Science + Programming = Success. We were able to create a common language despite our diverse backgrounds. Each team member gave it their all and we wouldn’t have succeeded otherwise.

Closing Remarks

We would like to thank the entire DataHack2018 staff for an amazing hackathon. The good vibes you spread throughout the event and the smoothness with which everything was run really made it an event to remember!

I would like to thank Philip Tannor for assisting in writing this and Rochelle Meiseles for editing.