Graduation Project

Research Blog ~ a weekly update

DevelopmentSupervision

WK10 – Model integration

This week has seen some progress in the development of the project. It has been a slow integration of the model, and the required elements for the model to process the information that it is receiving. The target for this week has been to make the model run and work with the phone camera for which some processing needs to be done. I have been mainly using Vision Camera which is an open-source library that works with react native, and that offers the functionalities I want to include in my app. However, the release and development of this library are constant, which is good for maintenance purposes, but it also means that some functionalities are unstable, and it’s almost a game in itself to find the versions of the dependencies that work well together.

I have also been participating in discussions within their Discord community, and it has been serving almost like testing for new releases and compatibility. This almost ties in with the Coding Six unit we are currently studying, which has given me a lot more confidence in participating in open-source discussions.

With that being mentioned, so far I have been able to integrate the camera and what is called ‘Frame Processors’. They are described as JavaScript functions that are called for each Frame that the Camera sees. These functions allow analysis to be done on each frame, which means that real processing can be done to the images that are captured by the phone.

The model can be included using a library that the same maker of  Vision Camera offers. The library, react-native-fast-tflite offers integration with Vision Camera, and the Frame processors. This means that the model will be able to be processed through the Frame Processors, and it will be able to assess the images that it is capturing. This can be done without taking a photo, which is a very powerful feature, as it assesses images in real-time.

I have started integrating some of the models I have found in the TensorFlow dedicated page within Kaggle. They need to be in TFLite format, and they need to have 4 outputs. This is a very important difference I have noticed between the models I have found. Even though they might fall under object detection, if they don’t provide meaningful information regarding the position of the bounding boxes, I can’t visualize the bounding boxes on the screen. To make sure I understand the structure of the model I am using I have found this useful tool called Nteron, that extracts important information of the model and presents it as a graph. The model I have integrated at the moment is Efficient det (without metadata).

The first step was loading the model and doing some pre-processing to the images that the camera receives so that they can be run by the model. The efficiendet model uses an input size of 320 by 320 in RGB mode. Mobile phones use a colour format that optimizes performance, it is called YUV ‘(Y’ represents the brightness, or ‘luma’ value, and ‘UV’ represents the color, or ‘chroma’ values). Therefore, I needed to convert the colour format from YUV to RGB. After applying the image pre-processing logic, I tested the camera to understand if the frame processors were working, and if the input image was being resized, and it seemed that the image was being processed as desired.

Next, I wanted to print out the other outputs that the model was giving me. I was very curious if it was going to give me the class names of the objects it was detecting, however, I only got the number of the class back. This meant that I had to look for the labels on which the model had been trained and manually map them to the numbers I was receiving. The Efficiendet model is based on COCO 2017, for which Mayra, the AI and Data Science technician, helped me to find a label map. After mapping the classes to their labels, I ran the model with a level of confidence of 0.7, and it was able to identify a teddy bear accurately, but other things were not accurate at all.

In this demo, the working of the camera, the frame processors and the model are demonstrated. However, the level of accuracy and frame latency sparked new questions on the kind of model I am looking to implement, and how to manage the processing of the information. I have been in the process of drawing the bounding boxes, but this has been slightly challenging due to the API I am using.


Finally, in an attempt to prove that the inputs and outputs that the model had the information that the Netron visualization tool had suggested, I decided to corroborate the visualization of the Efficiendet model, by printing in the console the data type, the tensor name and the tensor shape for the inputs and the outputs.

Next Steps:

While the implementation of the bounding boxes is still ongoing, I would like to find a model that is more accurate to the task I am trying to tackle: waste detection. I have been trying to find meaningful data with labels and bounding boxes’ coordinates, but it has been rather challenging. Is there a lack of this information? why is this not being open to everyone? My next steps consist of:

  • Finding a meaningful dataset that I can use to train my own model and run within the app.
  • Visusalize with Netron to see if the trained model meets the criteria (1 input/ 4 outputs)
  • Keep trying to draw the bounding boxes within the app in real-time.
  • Finish the Onboarding with assets, and the tabs navigation style.

Leave a Reply

Your email address will not be published. Required fields are marked *