Creating Personal Data for Deep Learning

Introduction

I am taking FastAi’s Practical Deep Learning for Coders Course and I learned about this simple method of using google images for deep learning.

I like to create personal projects to answer specific questions or to learn more about a given topic. My career experience so far has been working in some way with small scale farmers in rural parts of Kenya and I was particularly interested in how deep learning can be applied to solve some of the challenges experienced during data collection from the field.

Problem Definition

Most of the startups in Kenya in one way or another collect data from the field in order to track the progres of their projects. Small scale farmers in rural parts of Kenya usually plant trees or subsistence crops such as maize, potatoes and beans.

To track crop growth progress, the field team would visit the shamba (farm) and take measurements such as crop height, shamba size and shamba GPS points. They could also take pictures of crops on the farm as well as any pests and dieseases. For instance, an outbreak of Maize Lethal Necrosis Disease (MLN) in major maize (corn) producing regions in Kenya such as Rift Valley & Western could threaten production.

A company working with farmers in any of these regions will have to deploy quick and surgical intervention methods that will help farmers mitigate the losses from such an outbreak. Photos can also be used to estimate crop growth rates across different regions in order to calculate production estimates.

Data Generation

In this analysis, we will create a deep learning model to determine the type of crop that a farmer planted on the farm. Suppose we’re working in a region that has potential to grow maize and beans well and we are a startup company that supplies maize inputs such as seeds and fertilizers to farmers in this region.

Each planting season we drive around with a truck delivering inputs to farmers and a few weeks after deliveries our field team visits them to make sure that they have followed planting guidelines and to take photos of their shambas.

As data scientists on the team we propose that we can use deep learning to confirm whether the farmers actually planted the maize that we delivered. Since we do not yet have any photos in our database, we are going to use google images to create a prototype model that we can share with the management.

We search for maize farming in Kenya as shown here

Then we scroll down the browser window until we have a good number of images to play around with. In order to download the images, right click on the browser and select inspect or command+option+I in Mac if using chrome. Then copy and paste the following code in the console and press enter.

urls=Array.from(document.querySelectorAll('.rg_i')).map(el=> 
    el.hasAttribute('data-src')?
    el.getAttribute('data-src'):el.getAttribute('data-iurl'));
window.open('data:text/csv;charset=utf-8,' + escape(urls.join('\n')));

This will create a csv file with urls for the loaded images in a pop up window. Please note you might have to disable adblock for the download to work. Then repeat the same process for beans farming.

Next, we upload the csv files to the same working directory as our python code in google colab and use the code below to create folders to store our images. The files maize.csv and beans.csv contain urls from our google image search.

Model Training

We can now train our deep learning model using fastai’s learner library which provides a nice and easy to use wrapper for Pytorch library for machine learning.

The following code shows how we train our convolutional neural network (cnn) using fastai.
Running the code above, we get a model accuacy of 72% as shown below

Our model accuracy of 72% is probably not strong enough to convince management that we have a good handle on detecting fraud cases on the farms.

The management knows that there are some edge cases where a farmer received the correct inputs but failed to plant because of one reason or another.

Examples:

  • The farmer has a strong believe that the planting season is not right and will have to wait a bit longer. They believe that by planting a short term crop now they will be ready to plant maize at harvest.
  • The farmer already had another crop on the shamba such as beans that they will have to harvest first before planting

Model Tuning

Our low accuracy could be because some of the images we downloaded did not belong to any of our classes.

As shown below, it’s quite difficult to determine if the farm contains beans.

Lucky for us, fastai has a nice image cleaner that we can use to delete those images that our model wasn’t sure about.

Deleting poorly labeled images from data and training the model again increases our accuracy to 78%. This is close enough to get a buy-in from management. Now we need to further tweak our model for better perfomance when we start receiving images from the field.