Student Data: Moving Past Transparent to Tangible

It’s hard not to think about data security and privacy right now. NSA’s electronic snooping. Target’s financial data breach. Heartbleed’s online security hole. Combine these developments with the current momentum in schools to move education data from isolated tanks to connected pipes with active flows, and you have a combustible mix.

I contend if there’s going to be a fire, let’s at least ignite something specific--and not indiscriminately burn everything to the ground.

They-Which-Must-Not-Be-Named

Early yet recent missteps in judging tolerance for, and understanding of, mixing and matching education data have been spectacular:

The one-stop data warehouse provider inBloom, despite its non-profit status and open source code, suffered a series of setbacks thanks to a combination of unfortunate timing (that NSA thing), guilt-by-association (controversial statements by Bill Gates about teachers as the Gates Foundation provided funding, and coding work done by an edtech company owned by Rupert Murdoch), and self-inflicted wounds (initial sluggishness in directly responding to criticism or clearly explaining what inBloom-the-service was).
Google is currently embroiled in a California consumer lawsuit over Gmail content scanning, one which unexpectedly revealed that Google Apps for Education automatically scans everything students use it for--even if that email data mining doesn’t result in ads being shown to them while in the classroom. Because, unless otherwise arranged by districts, Google’s consumer privacy policy applies to education uses.

In all cases, there's no shortage of finger pointing. This is something of a time-honored tradition when something goes very wrong in K-12 education. Districts hide behind vendors (especially, historically, when student tests go awry). Vendors hide behind contract language. Both hide behind the minimum requirements of federal and state law.

But everyone can, and must, do better when it comes to activating formerly isolated, and now connected, student and school data. Few doubt we need a better way to store and manage education data as the digital transition accelerates, or that we need cloud-based productivity and communication tools available wherever students and teachers happen to be.

Light at the end of the … you know

A rapid-fire series of surveys and reports over the past several months (from Common Sense Media on perceptions of student data privacy, Fordham University on flaws in district cloud computing contracts, and Harvard’s Berkman Center on overall issues), combined with understandable parent confusion and concern, has spurred proposed “best practices” for student data storage and use.

While neither is perfect, the set from the US Department of Education (for educators) and the Software and Information Industry Association (for industry) are a start by acknowledging that certain matters of security and privacy need to explicitly address parents and students. And, critically, they reflect that what the law requires is a minimum baseline of what can or cannot be done with student data.

Transparent is not enough

So the first steps are being taken in getting to transparency. I’d suggest we need to take a giant leap in the discussion--to the tangible.

People naturally fear what they don't understand, and it's hard to understand a product-concept-in-waiting--especially if there's no current equivalent. And very much of the discussion about the potential benefit of storing, connecting and mining student data has been very conceptual. (InBloom might well be the poster child for this, because to make comprehension matters worse, the service is primarily data plumbing behind the scenes and not a product that teachers, students or administrators directly touch.)

While the high-level policy discussions, best-practice standards and general debate are incredibly useful, actual practice needs to be reduced to the concrete.

Take ATMs, or automated teller machines, that we now take for granted. Forty years ago when banks were considering them, they convened a series of focus groups with customers. “Trust my cash to a MACHINE?,” aghast participants protested. “I know my bank teller. I trust her. How I can be sure my money will be safe?”

What these customers did not, could not, internalize is the subsequent 24/7 convenience of being able to do banking transactions on their own schedule, and in far many more locations than would support a full bank branch.

We’re at that point with making intelligent use of digitally generated school data commonplace. The fear of the worst-case scenario has overwhelmed any mental space that might otherwise go to comprehending implementations and working out the inevitable issues and kinks.

Small is beautiful

We need to pull back and think small, not big. The successes to date in mining and analyzing education data generally have been on a smaller scale, and often outside of K-12.

Small means specific:

Like Purdue University’s Course Signals, a product dating back to 2009 that mines learning management systems for 20 data points to display green, yellow and red “signals” if a student is at risk of failing a course.
Like the Root-1 (now owned by Edmodo) mobile app Word Joust, a set of vocabulary flashcards launched in 2011 which mask an intelligent back end that not only adapts word presentation based on how the individual engages and performs, but also consults and adapts its presentation based on how other students with similar response patterns have performed.
Even, potentially, like Knewton, which promises that it can make recommendations for the next step students can take across digital textbooks that use its platform, perhaps advising a student struggling with math word problems that he or she may want to tackle some reading instruction first.

All of these may not work fully as advertised (yet, or ever), but that’s not the point. By precisely packaging and identifying what data is gathered, how it will be analyzed (or “mined”), and what result is anticipated, you remove the vague what-ifs. Everyone is then judging discrete products that can be understood, poked, prodded and dissected.

A not-so-modest proposal

If administrators and vendors want to spur a productive discussion about data mining’s benefits and drawbacks, create a target that someone can hit, not a scary Blob that threatens to forever absorb every student that falls into its path.

My humble three-small-steps suggestion:

Define the "product." Exactly what data will be mined, for what purpose, and what is the visible manifestation of the result? Put a box around it. Start small. The engine underneath it may be huge and have great capabilities, but the initial purpose should be clear and understandable, like Course Signals or Word Joust.
Lay down the limits. State the rules for this particular use of data collection, storage, sharing, mining, analysis and purging. Go beyond what the law requires; that's only a minimum. Post all policies publicly for transparency.
Stop hiding behind each other. This isn't the story about the two hikers encountering the bear, in which the first says "We'll never outrun it!" and the second keeps running while shouting to the first that he doesn't need to outrun the bear--he just needs to outrun the other guy. Trust me. With student data, the bear of outraged public opinion will catch up to both educators and vendors, regardless. Plus anyone else standing in the way.

Transparent. Tangible. Aiming for trust. It's not a perfect plan. But it sure as hell has got to be better than what's happening now.