By Ben Morse, Social Impact Technical Director

The growing availability of data has created new opportunities for development practitioners to measure, track, and enhance the impact of their work. Social Impact’s newly formed Data Science team applies machine learning algorithms to provide new and powerful insights to inform evidence-based decision-making.

In this post, we highlight a recent Data Science project that used machine learning to improve a risk screen to more efficiently target violence-prevention counseling programs towards youth most at-risk of becoming involved in crime.

The challenge of targeting development programs towards those most in need

Development programs can only be effective if accurately targeted towards those in need. Yet targeting programs effectively requires rich data on where needs are greatest and who within these areas is most in need. In many situations, such data are not available.

In response to this challenge, a growing number of scholars and research-oriented practitioners in the field of development have begun to harness the power of machine learning to inform program targeting. Over the past several years, machine learning methods have been used to develop “poverty scorecards” to accurately predict whether a household is poor with just ten questions; to measure female empowerment with just five survey questions; and to target cash transfers to those most in need using mobile phone metadata.

Family-based counseling to prevent youth violence

Family-based counseling programs enlist family counselors to work with at-risk youth and their immediate family to promote family cohesion. The hope is that by improving intra-family communication, strengthening family cohesion, and connecting families to community activities and after-school programs, these programs can disrupt risky behaviors and replace them with positive and safe activities.

Targeting these programs towards youth most at-risk of criminal behavior, however, is a daunting challenge. Recognizing the potential of machine learning methods to greatly improve strategic planning and program targeting, SI’s Data Science team partnered with USAID’s Eastern and Southern Caribbean Mission to develop a risk assessment tool to help the Mission’s partners more accurately target these programs towards at-risk youth.

We began with three core criteria that the risk screen must meet: it must be short and easy to administer, it must be easy for program administrators to grade and aggregate into a risk score, and it must accurately predict who is most at risk of becoming involved in criminal behavior in the future.

To meet this challenge, we drew on longitudinal survey data from a sample of 2,393 potentially at-risk youth collected as part of a prior SI impact evaluation in three Caribbean countries: St. Lucia, St. Kitts and Nevis, and Guyana. These data contained detailed information on a wide range of potential risk factors, including anti-social tendencies, weak parental supervision, impulsive behavior, negative peer influence, and past criminal behavior, among others. After merging these data into a single dataset, we used “Lasso” and Ridge regression machine learning prediction algorithms to identify which risk factors measured at baseline were the strongest predictors of criminal behavior at endline. Using cross-validation resampling methods to optimize and validate out-of-sample accuracy, we identified which risk factors were the strongest predictors of criminal behavior – and by extension, which risk factors were poor predictors and could be omitted from the risk assessment tool without sacrificing accuracy.Lasso ranking chart showing how indicators like age and gender predict future criminal behavior.

The results displayed in the figure indicate that past criminal behavior, age, whether an individual is out of school, anti-social behavior, sex, and impulsive risk-taking are all strongly predictive of future criminal behavior. The remaining variables measured in the survey were found to be poor predictors and were omitted from our risk assessment tool.

Informed by these results, we used the subset of the strongest, most-accurate risk factors as the basis for our streamlined risk assessment tool. The resulting risk assessment tool, which we named the Youth Risk Screen (Y-RISC), is easy to administer, easy to score, and based on our analysis, highly predictive of future criminal behavior. The tool was pretested on a sample of 90 youth from six Caribbean countries in May of 2021, and is now publicly available for researchers, implementers, and policymakers to use to help improve the targeting of their services.

For full details on the methodologies employed in this study, and to access the Y-RISC, please see the full report.