And reduce your datasets biases!

Working on any project leads directly to deal with datasets.In this article I will explain to you how the best tech companies which are working on Computer Vision projects leverage crowdsourcing to collect and label huge amounts of pictures/videos and how to smooth this workflow to avoid any pain point in the road.Quality assurance is also a big deal, as it will impact your algorithm’s performance, that is why I will include Q.A. in this automation.

1. Data Collection

Most of the time, data collection is the hardest part of the process. If you work on specific projects and do not have any picture database to exploit, the first thing will be to find a way to acquire enough pictures to work on.Let’s take an example: We want to train a CNN to recognize each coffee product from supermarket shelves. In one country, we can meet +1,000 different products, and if we need 50 images/classes to correctly train & test my algorithm, we will need for that project, for one country!

The old fashioned way would be to take a camera and walk to the first mall to snap all of the products you can. But how can you be sure you will catch them all? And how much time will it take for 50,000 pictures? At what cost?

Leverage the crowd to collect pictures

Mobeye offers the access to+1M smartphone users in +10 countries (US, Europe & Asia) you can ask for specifics pictures to be taken. Your 50,000 pictures will be shot in a few hours!

Mobeye users will earn money to take the pictures you requested.

You will earn: Time, Money & Quality, as all projects are reviewed by the Mobeye Q.A. team.

Reducing Data Bias

Asking several users to create your dataset will also reduce bias you would get by asking one single person to take all pictures. From the way people take pictures to the camera they use, your dataset will be stronger with “real life” data.

Enriching Data

One of the best parts to ask the crowd to build your dataset is that it is super easy to ask people to enrich the data they collect.