<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=204513679968251&amp;ev=PageView&amp;noscript=1">

Predictive Analytics Frequently Asked Questions


Getting Started

What is the minimum number of records required to build a look-alike model?

The minimum number of records required to build a robust look-alike model is 300.  We ask for your customer data input to be larger — 750 records — because not all of the records may be eligible for building the look-alike model:

  • After uploading your customer data file, the file goes through data manipulation, standardizing, and cleansing processes. There will likely be  rejected records due to incomplete or incorrect addresses, missing names, etc.

  • datadecisions Group only asks for the name and address of your customers which we match to append hundreds of elements to each customer record using the DecisionPoints data aggregation.  Given that we can never match all of your aggregated records, there is a further reduction in the availability of records for use in modeling.

  • Your customer input file is used to create the look-alike model and to build a completely separate Validation sample, on which we apply and validate the newly built look-alike model. Validation of the new model with a separate sample confirms the precision of our model and is a standard procedure when we build the models but require more records.

Can I upload a datadecisions Group’s predictive list to Google AdWords and Facebook Ads Manager?

Yes. You can easily take predictive (or DDG’s) lists and set up custom audiences for targeting on Google AdWords and Facebook Advertising. Export your list from the Predictive Platform and then follow the directions below.

Inputting Your List Into Google AdWords Login to your company’s AdWords account. In the upper right corner of your screen, find ‘Tools, Billing and Settings’ (a wrench icon) and click to open. Select ‘Audience Manager’ under ‘Shared Library’. Click on ‘Audience Lists’ from the menu on the left and then click the plus button to create a new audience list.

In order to upload your datadecisions Group predictive marketing list, choose to upload a plain text data file and then select your list. Make sure that your list contains the following column headers: Email, First Name, Last Name, Country, Zip, and Phone.

From here, you can set a membership duration (the default is unlimited) and click ‘Upload and Create List.’ Your data file can take up to 48 hours to upload completely and you can watch the progress of the upload under “Audience Lists.”

Inputting Your List Into Facebook Advertising Login to your company’s Facebook Ads Manager account. From the left-most dropdown menu, navigate to Audiences under ‘Assets.’ If you already have Facebook audiences, click the ‘Create Audience’ dropdown and select ‘Custom Audience.’ If this is your first Facebook audience, you’ll see some audience creation buttons — click ‘Create a Custom Audience.’ Whether you already had Facebook audiences or you’re creating your first one, click ‘Customer File.’ Click ‘Add From Your Own File’ and choose to upload your file as a .txt file. Click ‘Upload file’ and select your Predictive Analytics marketing file. Give your audience a name and description, then click ‘Next.’

Facebook will show you a preview of your data and how they would classify it. Make sure that they have given your data the greenlight. If your data has been flagged in orange, just make sure that Facebook is mapping the right field for that value. If you see a lot of errors, it may be because Facebook is looking for the wrong delimiter (the punctuation mark that separates data points). You can change this by hovering over “modify the delimiter” and choosing a new one.

Facebook then hashes your data, uploads it and creates your audience for you. The upload time varies depending on how large your file is. Feel free to continue working in another browser tab or window while you wait. When Facebook is done creating your Custom Audience, they automatically generate a list of next steps you can take to get the most out of it. You can take one of those next steps immediately or click ‘Done’ to finish.

Get more information on two of the next steps you can take: creating a Lookalike Audience and creating an ad.

You only used part of the data I uploaded, why?

Our solution is not optimized to maximize the percentage of your data that you upload. datadecisions Group ensures that we match a sufficiently high number of records to build a strong and robust analysis in record time. Of course, we could increase the amount of data, but our experience has proven that data volume does not improve the quality of the results. Also, there would be an additional negative impact in the time required to deliver results. (See question: “What is the number of records required to build a look-alike model?” for further reasons why customer data may be excluded from analysis). If you would like our team to maximize the quantity of your customer data  used for analysis, please contact us and we'll put you in touch with one of our data scientists.  We can work with you on a customized solution to satisfy your specific needs.

Does this work outside of the US?

At this stage, datadecisions Group’s look-alike models are available only in the United States. We use data that is collected through various providers to create the customer profile. Nevertheless, it is possible to build other types of predictive analysis such as custom response models, provided that they leverage only your data. 

What is a look-alike model?

Look-alike models analyze the profile of your customers using predictive analytics and machine learning techniques to identify what is unique about your customers and what characteristics set them apart from the rest of the population. Once you have built this model, you are in a position to identify, within the broader population, individuals who match the profile of your customer. Three key steps must take place to build a look-alike model:

  1. Match your clean and de-duped customer data with data from datadecisions Group’s databases which include hundreds of variables such as age, gender, income, lifestyle, etc.

  2. Append a list of potential customers from the same geographical area, for which we also have the same variables available

  3. Apply predictive analytic techniques to identify which combination of variables enables the best differentiation between customers from non-customers

Look-alike models allow you to identify prospects with the right profile to be your customers in a data driven cost effective manner.

I work in an industry not listed on your website; can your look-alike solution still apply to my customers?

Yes. If you have customers, we can help you identify potential individuals who look just like them. Our solution can be applied to a wide variety of customer types, regardless of the industry. The math behind our solution uses demographic measurements (e.g. age, marital status), purchasing behavior (e.g. online/offline orders), and many other characteristics that every customer in every industry has – rest assured we can find people who look like your existing customers. 

What is a response model?

A response model identifies people likely to reply to a specific marketing campaign. Typically, response models are more focused than look-alike models in that they produce a higher response rate – the response model is built using people who are already responding to one given campaign. Building a response model requires the following steps:

  1. Execute an initial marketing campaign either on a random population or on a portion of the population selected through a lookalike model. The targeted campaign population should be large enough to ensure that you receive a few hundred responders.

  2. Campaign responses can then be analyzed through a predictive model after appending 3rd party data to the entire campaign population. The response model identifies what differentiates responders from non-responders.

The response model can then be applied to identify people within the population most likely to answer a specific marketing campaign.  

What response rate can I expect from the look-alike model?

The look-alike analysis does not guarantee any response rate. Response models (link to Response model) predict response rates. Once you have downloaded a list of prospects, you can then try different marketing strategies in order to identify what type of marketing mix is the most efficient to convert these prospects into customers. The next step would be to provide a list of all the people you contacted with a specific marketing campaign and run a response analysis – this analysis will enable you to identify which people are likely to respond positively to your marketing message.

Potential customer data coming from?

The datadecisions Group non-customer data is obtained from various sources of information, including surveys, internet forms, evaluation information, public sources, census, etc. We have 600+ variables with information covering approximately 127 million households (supplemental records available) in the United States and updated regularly. The information has been classified into 8 categories: Demographics, Buying Activities, Charitable Contributions, Finance, Lifestyle, Assets, Neighborhood and Others.

How to avoid GIGO (Garbage In, Garbage Out)?

When using predictive marketing, it is important to ensure that the information which is analyzed is relevant. Indeed, no matter how sophisticated the machine learning algorithms are, they can only find patterns that are not completely buried under noise. This is why datadecisions Group goes through several steps of data preparation:

  1. All 3rd party data used in the predictive models have been vetted and analyzed by our data experts as well as by our data scientists to ensure that the data are of the highest quality. (link to Where is the potential customer data coming from?)

  2. We apply the same vetting with your customer data, we start by cleaning it. Processing by datadecisions  Group includes parsing the data, checking the validity of the information provided and removing duplicates.

  3. Once the first 2 steps are in place, our predictive engines build 100 predictive models on different splits (sample) to ensure that identified patterns are reproducible and not simply due to specific sampling strategies that we use. In other words, we ensure that the models built by datadecisions Group are reliable.

Interpreting Results

Is it possible to add filters on my targeted leads?

Yes, we have multiple filters that can be applied to further assist in selecting your new leads. Without any filters applied, your best prospects will be selected from the same geography where your current customers, who we used for building the look-alike model, are located. But what if you want to go outside of that area? What if you have a limited number of locations (for example, the sites of your bank branches), and would like to find prospects nearby, expanding outside of your current market footprint? Or what if you’re interested in prospects with specific demographics? Our Advanced Filters screen allows you to customize your filters, in two areas.

1. Geography

You will be able to choose from:

  • your customers’ existing geography (based on the uploaded customer file)

  • entire US geography

  • customized list of states and/or zips (state or zip code lists can be uploaded and the geography can be expanded upon by increasing the radius of the area)

2. Demographics

You will be able to set up filters by:

  • Marital Status, Presence of Children, Home Ownership, and ranges of Household Income

One additional filter that can be applied is through the use of suppression files. You have the ability to upload a suppression file before purchasing your new leads.  

Is it possible to define a specific geography?

Yes. The datadecisions Group ’s look-alike model is built on the same geography as your customer data zip codes that were uploaded to the cloud. Using comparable geography to your existing customers enables the model to accurately determine the key factors that distinguish your customers from the random population.

Once the look-alike model is built, you have the option of discovering new leads across the entire country, even if your market is regional. Predictive Analytics can identify new targeted leads with similar characteristics that may be found outside of your current market footprint. See question ‘Is it possible to add filters on my targeted leads?’ for more information on defining specific geography.

What is the “Index” and how is it calculated?

The index is actually a relative measurement. An index enables you to easily understand if a variable’s value is above or below the population average. For instance, let’s imagine that you only have customers in Texas. What we do is calculate the average of each variable for the Texas population and assign it an index of 100. As an example, if the average income in Texas is $50,000 and your customers have an average income of $75,000, then the index would be 150 (calculated as 75,000/50,000*100). The index, in this case, can be interpreted as your customer, on average, earns 50% more than the average population in your market.

I only have customers in specific states, but I see prospects all over the United States, why?

While you may have customers in certain geographies, chances are that people in other areas of the country have the right profile to be interested in your offering. While datadecisions Group only creates our analysis based on your customer footprint, when the analysis is put into practice we apply it across the entire country. Using the advanced filtering function allows you to define and decrease the geography from which you would like to download prospects from – a handy capability particularly if your offering is for a limited geographic area. 

Which type of predictive techniques are used?

datadecisions Group uses a combination of mathematical techniques in order to ensure that we can build very accurate predictive models in record time. The predictive process can be split into three main components:

  1. Univariate non-linear models: the first step consists of encoding the variables so that we catch the non linearity in the data, take care of outliers and account for missing values

  2. Monte Carlo: Reach uses Monte Carlo techniques to orchestrate the creation of 100's of models which are built in parallel on different subsets of the data.

  3. Regression: Polynomial models are built on encoded data using different subsets of the training population in order to substantiate that the analysis are accurate while avoiding over-fitting of the model.

When I run an analysis twice, will I get exactly the same results?

You will not necessarily produce the exact same results; you might have some small differences. Before building our predictive models we prepare the data and create a market base which enables the identification of variables that distinguish your customers from the general population. On top of the list of customers you provide us, we add a random list of non-customers (a few hundred thousand).

This list of non-customers is actually chosen randomly with only one constraint, they need to come from the same geography where you have customers. The random selection of these non-customers ensures the results are not biased in any way. Two analyses would be run on two slightly different datasets ultimately producing models that will not be exactly the same.

How do you calculate the Fit measure?

The fit measure is a calculation of the accuracy of the model. This measure allows you to gain an understanding of how good a predictive model is.

  1. A score of 0 means that the analysis was not able to find factors which differentiate your customers from the rest of the US population. Essentially a fit of 0 means that selecting your population randomly is as good as what the predictive model is able to do.

  2. On the contrary, a fit of 100 means that the model is perfectly able to separate customers from non-customers. It is the perfect solution.

  3. Predictive analyses with accuracy over 95% are very suspicious and we recommend double checking the uploaded data file to ensure that there is nothing suspicious in the provided information.

What is the lift chart?

The lift chart, displayed on the analysis tab, is a traditional measure used in the marketing world in order to identify whether the predictive analysis made is relevant. After building an analysis and assigning a likelihood score to each customer, the data are sorted by descending order of this likelihood. On the left of the horizontal x-axis, you have people with the highest likelihood and on the right you see the people with the lowest likelihood. On the vertical y-axis you see an index. An index of 190 on the vertical axis when looking at the first 10% of the population (on the horizontal x-axis) means that these people are 1.9 times more likely than average, an index of 300 would mean 3 times more likely than average, etc. 

What are the key predictors?

Key predictors or predictive variables correspond to the variables which set apart your customers from the rest of the population. These are the variables that were retained by our self-learning predictive analytics engine. The machine learning algorithm tests over 600 variables and 100 predictive models. By combining these models, we are then in a position to identify the variables that are differentiating between customers and potential customers in the most efficient way. 


How much does it cost?

datadecisions Group is proposing multiple pricing options in order to ensure that we provide you with the most appropriate offer for your needs. The table below describes the different options:


Please note that for each option we provide various volume discounts and would be happy to answer any questions. Please contact us at sales@datadecisionsgroup.com.

Predictive Platform Integrity

What is householding?

Identifying, at the individual-level, who is living in the same household. Using multiple data sources allows you to confirm that there’s only one family at an address and that the identified family consists of the current residents. Householding is crucial in most marketing campaign as in many omni-channel campaign you want to make sure that you do not contact the same household  several times. This especially true for call centers and direct mail activities. 

What is deduping?

A data hygiene technique for eliminating copies of repeating data records.

How do I validate data against your internal sources?

If a consumer file (without duplications) is loaded, we typically match 40 - 75% of the input file. A low percentage means that the file has low integrity, meaning:

  • A lot of duplicates

  • Bad or incorrect addresses

  • Several family members in the same file (householding would reduce volume)

  • Use of PO Box or business address instead of physical home address

Is a higher match rate better?

Providing a high match rate is relatively easy, all one need to do is to do a match at zip level to get a match of 100%. A good match rate is instead 40% to 75% as within this range you usually have a relatively tight match key (so that the record is actually validated) together with a high quality database.

What are ghost records? Why do they matter?

Records with inaccurate, outdated, or false data. Most often, marketers encounter ghost data in the form of outdated addresses, which can lead to poorly personalized and/or targeted messaging.

What is NCOA?

The National Change of Address is a solution offered through the United States Postal Service (USPS) which makes change of address information available to companies. Mailers using NCOA to keep their mailing lists up-to-date. 

Is Integrity better than NCOA?

Simply put, poor targeting kills your ROI. Whereas NCOA depends on consumers actively filing their updated address with the United States Postal Service, Integrity goes above and beyond by identifying and suppressing ghost records and fill-in data from mailing lists. Best practice consists in using both NCOA and integrity. Direct marketing is about sending the right piece to the right person, if a person doesn’t live a the address that you have on file, then the money you spent marketing to them is wasted. 

How does Integrity work?

Upload your database into our cloud platform (with name and address fields). That’s it. Our smart matching algorithm evaluates the input data against our pre-cleansed, proprietary database made up of billions of multi-sourced, third-party data points. Once your check is complete, you can export cleaned, household data back into your CRM or marketing engine of choice. 

How regularly is your data refreshed?

Our data is regularly updated across a span of internal and third-party sources and regularly checked for integrity.

What are the key benefits of the Reach Integrity Check?

  • Save thousands of dollars in marketing costs per campaign.

  • Lift overall response rates.

  • Reduce cost-per-sale.

  • Increased deliverability  


Have more questions about Reach? Let us know and we'll help you answer them!