December 29, 2017

# Tutorial: Using NIST Fingerprint software to generate a pairwise comparison score matrix

### Purpose

In this tutorial, you will learn about how to work with a fingerprint database to the point where you can generate relevant performance curves and metrics such as equal error rate.

• the NIST Biometric Image Software (NBIS)
• In my setup, I have a host Windows machine and run Ubuntu 16.04 using VMWare. If you use this setup, you should read my previous post which sets up the environment correctly. You can, of course, use VirtualBox to achieve the same result; or even mix the two, e.g., create VMWare image and then run it in VirtualBox.
• For performance, you should get the git clone the wer function using git clone git@gitlab.com:normanpoh/DETconf_octave.git or else visit this link.

Once you have fulfilled the above requirements, you can then follow the instructions set out below.

I will be using a combination of bash and Octave scripts in order to plot the performance curves. Once you understand how this works, you can, of course, use different biometric software and modalities.

### Intended audience

I have written this tutorial for researchers who are interested to start working in biometrics but are not fully confident with using bash and Octave or Matlab. With this in mind, every step has been carefully written so that they can follow, whilst learning the strengths or features that the scripts can offer.

### Procedure

We shall use a publicly available fingerprint database to illustrate how you can use NIST’s fingerprint matching bozorth3 to generate an exhaustive pair-wise comparison score matrix.

1. The folder in which all fingerprint images are stored is /mnt/hgfs/C/Users/np0004/Documents/LivDET/data/2011/TrainingItaldataLive.
2. NIST’s fingerprint software has been installed properly and all the binary execution files are configured in the file .bashrc.
##### 1.Configure the directories.

You should replace fdir (which contains fingerprint images) and outdir which is the output (where fingerprint features will be stored) with your appropriate directories.

##### 2.Extract the features

To extract the features, below I assume that the fingerprint images are stored in .png format; so the command convert is used to convert the images to .wsq. The code below simply loops through all the files and extract the fingerprint features using NIST’s mindtct program.

All the features can now be found in the directory $outdir (i.e., features in the example above). To check the output, type ls features/*.xyt | head. The database contains <user-id>_<attempt-id>.png; so the processed output contains <finger-id>_<attempt-id>.xyt. There are 200 unique fingers and for each finger, there are 5 attempts; so thare are a total of 1000 fingerprint images. ##### 3. Generate the score matrix in Octave We could use ls features/*.xyt > filelist_xyt.txt but then the output is not ordered. So, instead, we shall manually create the list of filenames in Octave, as follows. Let’s write two functions to get the user and session indices, respectively. These 2 pieces of information are often needed in order to extract genuine and impostor scores as will become clear later. In order to extract the information, I have chosen to achieve this by processing the filenames in order to illustrate how you can use text processing in Octave. In practice, you should modify the scripts to suite your needs. Recall that each role in the filelist_xyt.txt has the following format: features/_.xyt, e.g., features/100_1.xyt so to extract the user (unique finger) id, we split the string into two parts by '/' to get the second part 100_1.xyt and then furtther split this string by _ to get to the first part of the string, i.e., 100. Using the same methodology, we create the get_session function. Here’s how you apply the function. Writing the for loop is not usually a very efficient way of doing this. Since the variable fname is a cell type, we can use the cellfun and then further define an annonymous function applied to each cell. In our case, this function is @(x) get_user(x) which reads, apply each cell x using the function get_user’. The variable user will be used later to distinguish between genuine and impostor scores. Next, we use bozorth3 to compare a template with a list of gallery images. Since we are perform exhausitve pair-wise comparisons, we take each of the image in the list and compare it with the list of gallery images. This process is repeated for each fingerprint template in the list. Well, it is possible to parallelize the process above so you can use all the CPU cores. Refer to this link to find out more. Let us now plot the score matrix of 100x100 out of 1000x1000 so we can see in details how the scores look like. As can be observed, the scores appear in blocks forming the diagonal where the genuine scores are expected to be there. Since the diagonal of the matrix are comparisons due to the same image, we should ignore them by setting them to zero. The code below does this by using a mask. Minutiae-based similarity scores are not symmetrical but really they should since comparing samples A and B should be the same as comparing B and A. To enforece this property, we can use codes below, which take the upper triangular matrix of the scores and add it to the transposed version of the lower triangular matrix, and then take the average of the two. Because of the above matrix operations (triu and tril), the lower triangular matrix has zero values everywhere. We can create a function get_symmetric_scores to do this. Now, we create the mask for the genuine scores. Do the same for the impostor score mask. We can observe that the genuine scores and impostor scores are occupying different parts of the original score matrix. Next we shall use the masks created to get the imp_scores and gen_scores. Since we are going to apply a series of masks to get to the imp_scores and gen_scores, we might as well write a function for this. Let’s call this get_scores. In the above code, scores(mask_imp==1) selects the elements in the score matrix where the mask_imp is true. This conveniently returns a column vector of impostor scores which is the formwat we need. Recall that scores is a score matrix; and not a column vector but by using the subsetting operation scores(mask_imp==1) returns a column vector. So the matrix has been linearlised implicitly. Having obtained the genuine and impostor score vectors, we can now plot the performance using the wer function, which gives us the equal error rate (EER) of the system. The above codes generate the figure below. Note that using option 1 as in wer(imp_scores, gen_scores,[],1) can be terribly slow because of the use of kernel density function which was used to generate the upper-left corner plot below (figure (a)). This plot shows the density of the scores, with the continuous line showing the distribution of the genuine scores and the dashed line indicating the impostor score distribution. Figure b shows the False aceptance rate (FAR) curve in dashed line and False rejection rate (FRR) in continuous line. (c) shows the Weighted error rate (WER) curve which is defined as the average of FAR and FRR: In order to visualize the score distributions, we can use generalised linear transform; which is defined as: where $y_{lb}$ and $y_{ub}$ are the lower and upper bounds of the score, respectively. We see that the generalised logit transform as computed below may result in -Inf due to $\log ( 0 )$; and this happens when $y=y_{lb}$ or $y=y_{ub}$. So, if we deal with fingerprint minutiae-based system, we should not set $y_{ub}=0$ but $y_{ub}=-1$; and so as its uppper bound which is to be set to a sufficiently large number by trial and error. It is implemented using the function below logit_transform function below. where $\max(y)$ is the maximum value the fingerprint similarity score$y\$. We can set this to be slightly larger than the maximal observable score, which is 400 in this case. We then apply the function to the genine and impostor scores and then replot the performance curves using wer.

As can be observed the score distributions look a lot more sensible but the performance should remains the same. Why? Because the generalised logit transform operator is a one-to-one order-preserving transformation, i.e., the orders of the scores before and after the transformation does not change.

### Summary and where do you go from here

In short, this tutorial shows you how to work with NIST’s bozorth3 software to generate scores which you can then analyse using the wer` function in order to plot the system performance.

In the next tutorial, I will show you how to optimize the system performance by adjusting the fingerprint minutia quality. The DET curve below shows the performance gain we could obtain if we retain only the minutiae with sufficiently high quality.

In addition, on reflection, you may have noticed that we have used all samples in a database to generate the genuine and impostor scores. So, if there are $J$ users and each has $S$ samples, then a total of $\frac 1 2 JS(JS-1)$ matching is needed (ignoring the symmetrised score operation). If we deal with millions of users, this can not only be prohibitive large, but also that the proportion of impostor scores relatively to the genuine scores becomes very disporportionate. You may also be interested in an alternative biometric experimental protocol that can deal with this situation.