Daniel Hindrikes
Developer and architect with focus on mobile- and cloud solutions!
Developer and architect with focus on mobile- and cloud solutions!
Last fall, me, my family and a friends family took a trip to the forest for collection mushrooms. This blog post will not be about the trip to the forest, but that my friend not was good to recognize mushroom species I got an idea to write an app for that. So this post will be about how I wrote that app using Azure and Xamarin.
The first thing we need to do to detect mushrooms in photos is to train a model that I can use to make predictions. In Azure Cognitive Services there is a service called "Custom Vision". We can do two things with custom vision, we can use image classification by upload images and tag them and we can upload images and tag specific areas in the image so the trained model can be used for object detection. The object detection part is in preview at the time of writing. So what I did was to upload photos of different mushrooms and tagged them. While this blog post will focus on how to consume a trained model I recommend you to read the official documentation about the custom vision service, https://docs.microsoft.com/en-in/azure/cognitive-services/custom-vision-service/home. We will also focus on the image classification part, while that is enough to do a mushroom recognition app and the trained object detection models is not able to export right now. And in an app like this is really nice to be able to do classification without an internet connection.
When we are doing image classification in Custom Vision we can export models for doing prediction locally on the user's device, that makes it possible to do predictions without any connection to the service in Azure.
While I need to do the prediction in the platform project I created an interface to so I can use it from shared code.
public interface IImageClassifier
{
event EventHandler ClassificationCompleted;
Task Classify(byte[] bytes);
}
public class ClassificationEventArgs : EventArgs
{
public Dictionary Classifications { get; private set; }
public ClassificationEventArgs(Dictionary classifications)
{
Classifications = classifications;
}
}
We will get the result from the classifier as a Dictionary of tags and how confident the result is. In this case, we show that we could identify a mushroom if the classification has a confidence that is higher than 90 percent.
public void DoClassification(byte[] bytes)
{
classifier.ClassificationCompleted += ClassificationCompleted;
classifier.Classify(bytes);
}
private void ClassificationCompleted(object sender, PredictedEventArgs e)
{
var top = e.Predictions.OrderByDescending(x => x.Value).First();
if(top.Value > 0.9)
{
//Show what mushroom that was in the photo
}
else
{
//Handle that mushrooms not could be identified
}
}
CoreML is built-in in iOS from iOS 11 and above.
While we have exported our model you need to import it to your iOS project, the mode should be placed in the Resources folder. When the model is added to the Resources folder next step is to use it in the code. Before we can use the model, we need to compile it. If we want we can pre-compile it with Xcode, but the compilation is really fast so in this case, it not necessary. After we have compiled the model we can use the compiled model url to store it in a reusable place, so we don't have to compile the model next time you want to use it. The code example below is not covering to store the compiled model.
var assetPath = NSBundle.MainBundle.GetUrlForResource("mushroom", "mlmodel");
var compiledUrl = MLModel.CompileModel(assetPath, out var error);
var model = MLModel.Create(compiledUrl, out error);
Now we have a model that we can start to use for classifications.
Before we are doing the prediction we will create a method to handle the result from the classification. When we get the result we will put it in a dictionary with the tag and how confident the result is for that tag.
void HandleVNRequest(VNRequest request, NSError error)
{
if (error != null) return;
var predictions = request.GetResults().OrderByDescending(x => x.Confidence).ToDictionary(x => x.Identifier, x => x.Confidence);
ClassificationCompleted?.Invoke(this, new ClassificationEventArgs(predictions));
}
Now when we have a method that makes the request for classification:
var classificationRequest = new VNCoreMLRequest(VNCoreMLModel.FromMLModel(model, out error), HandleVNRequest);
var data = NSData.FromArray(bytes);
var handler = new VNImageRequestHandler(data, CGImagePropertyOrientation.Up, new VNImageOptions());
handler.Perform(new[] { classificationRequest }, out error);
Android has an API for MachineLearning from version 8.1 and above, https://developer.android.com/ndk/guides/neuralnetworks/. But while many Android devices running on a lower Android version I have chosen to use TensorFlow. TensorFlow is open source framework for Machine Learning created by Google. To use it on Xamarin.Android, Larry O'Brian has created bindings (https://github.com/lobrien/TensorFlow.Xamarin.Android) so TensorFlow can be used with Xamarin.Android. To make classifications of images we will use his library in this example. He as also written a blog post about this on the Xamarin blog.
When we are exporting from Custom Vision to a Tensorflow we will get a zip file that contains a model file (model.pb) and a file with the labels (labels.text). We need the labels to know what we have tagged the image while that not is included like in the CoreML model.
The model- and label file should be placed in the Asset folder of your Android project.
When we have added the model- and the label file to the Assest folder we can start to write code. The first thing we need to do is to create a TensorFlowInferenceInterface from the model and a list of strings for the labels
var inferenceInterface = new TensorFlowInferenceInterface(Application.Context.Assets, "model.pb");
var sr = new StreamReader(assets.Open("labels.txt"));
var labelsString = sr.ReadToEnd()
var labels = labelsString.Split('\n').Select(s => s.Trim())
.Where(s => !string.IsNullOrEmpty(s)).ToList();
Images need to be in 227x227 pixels, that means that the first thing you have to do is to resize the image.
var bitmap = BitmapFactory.DecodeByteArray(bytes, 0, bytes.Length);
var resizedBitmap = Bitmap.CreateScaledBitmap(bitmap, 227, 227, false)
.Copy(Bitmap.Config.Argb8888, false);
TensorFlow models exported from Custom Vision cannot handle images, so the image needs to be converted to binary data. The images need to be converted to a float array of point values, one per red, green, and blue value for each pixel, and also some adjustments to the color values are necessary.
var floatValues = new float[227 * 227 * 3];
var intValues = new int[227 * 227];
resizedBitmap.GetPixels(intValues, 0, 227, 0, 0, 227, 227);
for (int i = 0; i < intValues.Length; ++i)
{
var val = intValues[i];
floatValues[i * 3 + 0] = ((val & 0xFF) - 104);
floatValues[i * 3 + 1] = (((val >> 8) & 0xFF) - 117);
floatValues[i * 3 + 2] = (((val >> 16) & 0xFF) - 123);
}
The last step is to do the classification. To get the result you need to create a float array and send that into the Fetch method. The last step is to map the output to a label.
var outputs = new float[labels.Count];
inferenceInterface.Feed("Placeholder", floatValues, 1, 227, 227, 3);
inferenceInterface.Run(new[] { "loss" });
inferenceInterface.Fetch("loss", outputs);
var result = new Dictionary();
for (var i = 0; i < labels.Count; i++)
{
var label = labels[i];
result.Add(label, outputs[i]);
}
PredictionCompleted(this, new PredictedEventArgs(result));