The data collection approach consists of direct engagement with registered retail stores with verified transaction histories on the TradeDepot platform. Structured surveys are executed over recorded telesales sessions with these outlets on an ongoing basis, and the survey reports are normalized using historical transactional data and current demographic data of the respective territories to validate/eliminate outliers and standardize the data records.

We used a seven-step process to ensure each that survey properly represents the true nature of the market across all the classification criteria .

Outlet Registration and Verification

Retailers are acquired through direct field engagements with multiple verification points points for different data elements (acquisition, followup calls and delivery.

Survey Design

The survey design focuses on deploying a consistent theme across multiple consumer demographics (age, gender, location) and retailer classifications (channel type, socio-economic class). The survey questions are built to focus on specific consumer demographics

Dynamic Survey Generation
An Automatic scheduler generates specific survey questions based on occurrence frequency, retailer classification (food store, general groceries, etc) and target consumer categorization

Survey Execution

Surveys are deployed via recorded tele sessions by our in-house team of telesales agents. We complete a minimum of 2,500 recorded survey sessions daily with verified retailer customers. A quality management process ensures the survey is deployed consistently to eliminate bias. The surveys are captured in real-time and loaded into our data mining pipeline

Weighting

The survey data is overlay-ed and normalized with retailer live transaction data (sell-in) to minimize bias and identify outliers. To correct for sampling bias after the survey is run, we apply weights to up-weight underrepresented groups and down-weight over represented groups. We then calculate weights using an iterative process, also known as raking, to reduce bias across all other dimensions. 

Data Mining                                                                                                                       

The base data enriched with location and socio-economic data based on existing location and traditional demographic models

Data Modelling and Insights                                                                             

Automated machine language data models applied to achieve data classification by brand, product type, category and producer. The survey does its best to match the demographics of respondents with the current population of consumers per outlet.

 

Sampling bias

A few of the major challenge for phone surveys is mitigating sampling bias; that is, ensuring that samples represent the general population for which the sample is carried out.

We noticed that generally people tend to ignore phone calls from unknown numbers but the caveat to this is that people who use their phones (mobiles) for business tend to accept more calls than people with personal lines.

The survey is being conducted for only outlets registered under the TradeDepot network so there is a need to register more outlets so as to increase the probability of getting better results.

Within our panels, we offer representative sampling for registered outlets and convenience sampling for others (described below). Our network has been further segmented in under the google pluscode 6-box parameter which we have called a hex box for proper geographical mapping. 

1 hex box = 1.3 km

1 pluscode 6-box = 16 hex boxes 

For representative sampling, we evaluate the representation of a survey by balancing its sample demographics to match the demographics of the specified population: Adult (25 – 34, 35 – 55 ) consumers. We matched them based on four demographic dimensions: age, gender, residence and location. 

In the market, we use estimates for the national population of an area(LGA) from the current Nigerian Census Population Survey. In other areas with representative sampling, we rely on a combination of census data and internet data sources for an estimate of the population.

Convenience sampling means that respondents may be of any age, gender, or from any geographic location within the tradedepot network.

 

Weighting                                                                                                                           

To correct for sampling bias after the survey is run, we apply weights to up weight underrepresented groups and down weight over represented groups. This calculation is coarser than the calculation for dynamic fielding: instead of matching the three-dimensional joint distributions, we match each single dimension — the marginal distributions — on their own. We then calculate weights using an iterative process, also known as raking, to reduce bias across all other dimensions.

 

How raking works…

  1. First, we exclude all respondents with unknown demographics for age, gender, or geography and even brand. Then we calculate weights that will match the gender breakdown to the target demographic.
  2. Next, we account for the female population since our automated data only captures male by implementing this Female Ratio = (1 – male ratio). There are only 2 possible sexes we can account for in this survey(male, female).
  3. After weighting respondents by the gender dimension on its own using a case to account for the gender distribution called gender weight, we do the same for the age dimension, and then the geographical distribution of consumers per outlet called the sampling weight calculated by multiplying the gender weight by the number of consumers per shop or outlet. 

The weights will closely converge when the calculated weights are applied across every aspect of the report making all calculated dimensions to closely match their targets.  Weights in each survey are calculated based on 3 random questions displayed in the surveys at every instance. 

 

Error Margins                                                                                                                   

Four factors influence and can reduce the size of the modeled margin of error:

  1. Larger sample sizes which means that we have to continue registering and retaining more outlets.
  2.  Percentages closer to the extremes of 0% or 100%.
  3.  Lower confidence levels (we aim for a 99.9% confidence level in our final output). 
  4.  Lower variability of weights which means that we want to normalize the population as much as possible so as to rely less on weighting to get our estimations. We aim for more specific survey questions and responses.

Surveys may be subject to other sources of biases and errors, including — but not limited to — sampling and non responses.