Lack of data is the starting point to the end of severals projects…
In this article I will explain to you how the best tech companies which are working on Computer Vision projects leverage crowdsourcing to collect and label huge amounts of pictures/videos and how to smooth this workflow to avoid any pain point in the road.
Building datasets can be split in different steps. Two of these are: 1. Data Collection 2. Data Annotation
Quality assurance is also a big deal, as it will impact your algorithm’s performance, that is why I will include Q.A. in this automation.
1. Data Collection
Most of the time, data collection is the hardest part of the process. If you work on specific projects and do not have any picture database to exploit, the first thing will be to find a way to acquire enough pictures to work on.
Working on ML or DL projects, requires a lot of pictures. We have to build, train, validate & test datasets and the more classes the project includes, the more images we will need for each.
Let’s take an example: We want to train a CNN to recognize each coffee product from supermarket shelves. In one country, we can meet +1,000 different products, and if we need 50 images/classes to correctly train & test my algorithm, we will need a minimum of 50,000 pictures for that project, for one country!
The old fashioned way would be to take a camera and walk to the first mall to snap all of the products you can.
But how can you be sure you will catch them all? And how much time will it take for 50,000 pictures? At what cost?
Leverage the crowd to collect pictures
Mobeye offers the access to+1M smartphone users in +10 countries (US, Europe & Asia) you can ask for specifics pictures to be taken. Your 50,000 pictures will be shot in a few hours!
Mobeye users will earn money to take the pictures you requested
You will earn: Time, Money & Quality, as all projects are reviewed by the Mobeye Q.A. team.
Reducing Data Bias
Asking several users to create your dataset will also reduce bias you would get by asking one single person to take all pictures. From the way people take pictures to the camera they use, your dataset will be stronger with “real life” data.
One of the best parts to ask the crowd to build your dataset is that it is super easy to ask people to enrich the data they collect.
Here is an example where a company wanted to recognize fashion items from specific banners, we can ask people to go to that specific banner, taking pictures of items, then scanning the bar-code.
Bar-codes could be very useful for annotation & classification
2. Data Annotation
Using crowdsourcing for annotation is already a well-known process. A lot of companies provide such services, like Amazon Mechanical Turks.
Mturk provide an UI and taskers for your annotation needs
One key feature is that Mturk UI is fully editable to suit your needs. Mturk also provides an SDK & API to push your requests dynamically available in the most common languages (Java, .NET, Node.js, PHP, Python, Ruby, Go, C++…)
Fully Automated Data Collection & Annotation
Connecting Data Collection services such as Mobeye & Annotation services like MTurk allow you to create and qualify your dataset in a wink. From taking pictures to annotate each bounding box.
Here is the way you can scale your datasets without any hassle!
Workflow example: We want to create X Bounding Boxes of N Products
You can also crowdsource your Q.A. by asking multiple reviews per picture and comparing results.
Q.A. Workflow & Results
I hope this article will help you to build amazing datasets!
Ce site web utilise des cookies. Les cookies nous permettent de personnaliser le contenu et d'analyser notre trafic. réglagesAccepter
Privacy & Cookies Policy
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.