OAuth implicit flow with face detection
Introduction
I always wanted to try to use machine learning for face detection, and had a simple idea the other day. Why not try to use my face to authenticate myself on a site?
In this post I’ll describe a small PoC that shows an OAuth Implicit Grant flow using FaceNet neural network for identification. Now, my implementation is by no means production ready, the OAuth server implements only the bare minimum I needed for this PoC and my face detection can be rather easily hacked by just showing it a photo of me. Nevertheless, with some improvements, I intend to use this on some of my side projects, but definitely not to secure sensitive data.
The Concept
For the end user an OAuth Implicit Grant flow usually looks something like:
- the user navigates to a web site that requires authentication
- the site checks if the user is already authenticated, if not the user is redirected to a login page
- the user enters his username and password and is redirected back to the web site
- the web site takes the JWT token and uses the claims in it to allow or deny user’s access to parts of its functionality
Technically there is a lot more going on and, rather than diving into the details, I’ll point you to this tutorial.
My PoC differs from the steps above only in 3rd step. My OAuth server asks the user to use his webcam to idetify themselves, so:
- the user navigates to a web site that requires authentication
- the site checks if the user is already authenticated, if not the user is redirected to a login page
- the user uses his webcam and, if recognized, is redirected back to the web site
- the web site takes the JWT token and uses the claims in it to allow or deny user’s access to parts of its functionality
The Parts
The PoC consists of three parts, which will be described a bit more in detail in the rest of the post:
- FaceNet NN that calculates embedding vector for the user who is trying to login,
- OAuth server that implements Implicit Grant flow and compares embedding vector from step one to embedding vectors the server knows and
- a sample client, which requires an authenticated user
FaceNet
Before describing OAuth server implementation I need to say a few words about FaceNet. I linked to a Forbes article rather than the research paper since I didn’t read the research paper and you don’t need to (unless you want to) either ;)
FaceNet is a specialized face detection Neural Network. You might know that some image detection NNs can categorize objects in an image, for example they can say there is a cat or a dog in an image. FaceNet on the other hand does not categorize the object, instead it returns a vector, an embedding vector which embeds the features of a face. In some ways this is like a NN that categorizes objects in an image with the last (categorizing) layer removed and instead exposing the one before it, the layer that identifies features of the image.
Using an embedding vector is a more practical solution than categorizing each face. For example, if you were using face detection in your organization you would not want to retrain your NN every time a new employee joins. When categorizing, you’d be forced to do since you would need to add a new category for the new employee to your NN. However, if you use embedding vectors you just need a single image of the employye, pass it through your NN (FaceNet in my case) and store the embedding vector to be able to check employee’s identity in the future.
As just mentioned I need a single image of person to generate an embedding vector which is then used by the OAuth server to identify the user. This process is called One Shot learning and I used facenet-pytorch to implement it. All I had to to was follow this example.
And, if you’d like to know event more about the underlying concepts, do read Face recognizer application using a deep learning model (Python and Keras) article.
Oauth Server
I based my OAuth server implemention on the code from Building a Basic Authorization Server with Implicit Flow article. The author has a whole series of articles about the various OAuth flows. Thety are a good read and I recommend it if you want to know more about this subject.
So what does our OAuth server need to do? It needs to:
- Verify the client sent all mandatory information:
- client_id (the id of the web site trying to perform authentication),
- redirect_uri (the callback uri to go to once the token is generated by the server),
- nonce (used to prevent replay attacks) and
- state (used to prevent XSRF attacks)
- Verify it knows the client_id and redirect_uri (not implemented in the PoC, but essentially just a check to see if the combination is registered)
- verify user’s identity and generate a JWT token
- Redirect to redirect_uri
The 3rd step is where the interesting bits are. First, the user is presented with this:
Note 1: if you are using a privacy oriented browser such as Brave: taking a snapshot with your webcam is considered as a device fingerprinting attempt, so you will need to make sure you don’t have “Device recognition attempts blocked” set for the site.
Note 2: also, I cheated in the screenshot above a bit. You first have to click a button to allow webcam capture. Moderns browsers don’t allow sites to start capturing video automatically, since that is a privacy concern; only a user action can start it.
With notes out of the way we can get to the second part, the recognition itself. An image capture is posted to the backend when the user clicks on the big green “Verify my identity” button. This captured image is then run through FaceNet NN, an embedding vector is calculated. Then this calculated embedding vector is compared to the one known to the server (obtained via One Shot learning mentioned in FaceNet overview) and if it is deemed to be close enough we end up with a successful login attempt!
Sample Client
To demonstrate the whole flow I implemented a bare minimum Vue.js app. My code borrows some bit and pieces from Auth0 Vue.js Login example since it is neatly written, though instead of using Auth0’s library which is adapted for their specific case I used oidc-client, which is a more general purpose library and can be used with any OIDC compliant provider.
The client has a single page that requires authentication. The oidc-client handles the Implicit Grant flow, in other words it handles the talk with my OAuth server and once a user is authenticated it reads name claim from JWT token and presents this amazing page:
Summary
Do check out the full code on GitHub. The Readme.md will follow you through the setup steps and if all goes well you should be able to identify yourself pretty quickly ;)