WK12 – Thinking on optimising user interaction
Live survey user Interaction: considerations
The interaction I had originally in mind, presented the user with a live camera that performed object detection constantly and counted the objects that were found by the camera, as the user moved in the beach. The problem I hadn’t reflected on was the accuracy of the quantity of the detected objects. When I first opened the camera from within the app, the model would run along with it, and it would give a prediction on each frame of how many objects it could find. The camera was running at about 11FPS, and it was constantly running the model, which also caused some latency in the camera.
Reflecting on how to optimise the user journey and experience, and avoid latency issues, I decided to introduce a new interaction. This new interaction adds the functionality to allow the user to activate the model and take a photo with the click of a button. The fact that I am removing the constant inference that the model has to make, improves the latency of the camera and it also marks a clear start and end of the object detection stage. It also improves the app tolerance for errors, since the user has more options to re-run the object detection model.
Initial camera interaction
In the video it can be seen how the camera is working at 14FPS, with the model generating a prediction on every frame continuously. The console below shows an example on how the model is trying to make a prediction within a second.

New camera interaction
In the video, it can be seen that the camera is working at a peak of 27 FPS when the model is activated. Please note, that the model isn’t running as soon as the camera is opened, but rather it is activated when the user presses the button. After 4 seconds, the camera takes a photo of what is being observed. The console below is an example of the predictions being done when the model is activated, and then what is being logged when a photo is taken (image data).

This new interaction can be broken down as:
- Open the Live survey screen. This screen opens the device’s camera by default.
- The user is presented with a button. When the user presses the button, the object detection model is activated and it runs for 4 seconds at a peak of 27FPS. This means that the model can do 108 predictions.
- After that 4 seconds pass, the app takes a photo and the model is stopped.
- After the photo is taken, the bottom sheet of the UI should be updated with the found objects.
Counting objects

One of the problems I am currently facing is how to count the number of objects present in a frame. In the scenario where I get a frame with very different objects, the model will output a total count of unique objects. However, what would happen in the scenario where I have two elements of the same within a frame?
Also, these predictions are done per frame. I was thinking I could get the predictions of my model (which are currently being printed on the console), by targeting the last prediction on the last frame, and output that in my UI. To do this, I tried running the model at 1 FPS, so I would get 4 predictions only. The model managed to get a decent prediction in 4 seconds, and it was capable of displaying the total amount of objects detected in the last frame on the console. However, I haven’t been able to reflect the count within my UI.
What happens with the photos?
The photos could be stored and they can be used later for training purposes. As said in earlier posts, I haven’t been able to find a dataset that contains images of reliable annotated images. The aim of capturing photos is to grow the dataset to improve the model.
