Everything you interact with -- from the emails in your inbox to the paper on your desk -- is packed with nuggets of data. These rich and often untapped data sources have the potential to help your nonprofit make well-informed, strategic decisions. To harness them, you need to put systems in place that make entering, tracking, and analyzing data painless and effective.
Julian Egelstaff is a database developer and IT consultant. In our "Data Is Everything and Everything Is Data" webinar, Julian shared insights to help nonprofits make better use of their data. This blog post summarizes the key assets from Julian's webinar. You can also watch the recording at the bottom of this page or on our YouTube channel, and view Julian's presentation below or on Prezi.
One of the keys to leveraging data effectively is to practice proper data hygiene. If your data isn’t “clean” going in, its value is significantly diminished.
To explain why, Julian pointed to a sample hospital patient record. This record contains several fields: name, patient record, date of procedure, and notes. The notes field allows hospital staff to enter freeform text; in this record, the notes read “right pneumothorax, x-ray shows 75% collapse, chest tube inserted between 5th and 6th ribs.”
The problem with this notes field is that it, in Julian’s expert opinion, it “smushes” several distinct and critical pieces of information into a messy blob of text. The database will store this blob without parsing out the vital data, making it ultimately unsearchable and unusable for analysis. And typos and alternative spellings will only complicate things further.
Julian expanded on the issues with unclean data with another example, this one a course listing record from a national STEM educational organization. The organization was offering an after school physics program in several provinces and in both English and French. This data was entered into the database as follows:
Unfortunately, it's impossible to tell which provinces offer the data in French. Again, squishing distinct pieces of information together hinders meaningful analysis. So what’s the solution? Data must be atomic. Rather than smushing information together in one long row of indistinguishable text, each piece of data should be collected as an “atom” -- independent and distinct from one another.
Whenever there are limited possible values, users should choose from a set of options rather than enter freeform text. (These options can be presented through drop-down menus, radio buttons, etc.) Limited option sets produce uniform data that is easy to categorize, store, search, and analyze.
Similarly, users should not be permitted to enter numbers as anything but numerical characters (ex. “Five” vs “5”); otherwise, you won’t be able to use the data for calculations. And totals, ages, and other calculations should be made automatically based on raw data, not entered by users manually. For example, age changes every year, so if you are asking users to input their age, you’re going have to do manual database updates every time anyone celebrates a birthday. Rather than asking for age, ask for birthday and then do the calculations automatically in the backend.
What’s the difference between a spreadsheet and a database? This may sound like the setup to a cheesy data nerd joke (and it probably is), but understanding how the answer is the first step toward making your data work for you.
Many organizations process, analyze, and present their data in spreadsheets (most often in Excel). However, spreadsheets are better suited for reporting than analysis -- in fact, using a spreadsheet to process your data severely limits what you can do with it.
Databases, on the other hand, open up a world of possibility for working with your data. With databases, you can structure your data relationally, enabling you to discover connections and draw out deep insights across distinct and independent data sets, or tables. That’s why Julian recommends doing your analysis and calculations in a database, and then using a spreadsheet to extract specific data and present it through compelling, clear visualizations for analysis.
To illustrate the power of relational data, Julian posed a simple question: which actors have starred alongside Kevin Bacon throughout his career? Julian began by entering a sliver of Kevin Bacon’s filmography in a spreadsheet:
Unfortunately, there’s not much you can do with that. So Julian broke the filmography out into two separate database tables: one called Actors and another called Movies. Within this data structure, Kevin Bacon’s Actor ID is always 1, and each of his movies have a distinct numerical Movie ID. Then Julian added other actors, each with their own distinct Actor IDs.
In a separate table called Actor-Movie Link, Julian entered each actor ID along with that actor’s corresponding Movie IDs.
Actor-Movie Link is a relational table that connects the original two based on their common categories. Once this table has been populated, it becomes easy to extract a report that answers our initial question -- who has starred alongside Kevin Bacon?
This is a basic example, but it speaks to the power of relational data structures. Atomic data and relational categories enable you to draw out insights, relationships, patterns, and trends, all of which can help you make better decisions.
To collect hygienic, relationally structured data, you need to design accessible database systems and user-friendly processes. Because if you force your users, staff, and stakeholders use annoying software, they will inevitably reward you with garbage data. Not only is this a waste of everyone’s energy, bad data offers you no help in making good decisions.
There must be a natural, tension-free fit between the design of your data systems and your real world situation. Understand how data flows through your organization. Who interacts with what aspect of the data and at what point? Maybe an email requesting service comes through to a staff member’s inbox, who then forwards it to the co-worker responsible for approving that particular service. This approval gets entered into a spreadsheet, later to be pulled by the staff member who writes monthly reports.
The design of your database system should reflect this workflow. Make it easy for each staff member to enter the data he or she is responsible for at the necessary time. Don’t force a square peg into a round hole -- the quality of user experience is a major factor in the success of your data efforts.
Ultimately, good design requires you to think about your end goal first. Of course, goals change over time. This means that you need to devote energy and resources into updating your database design to reflect these changes.
So how do you make this all work within your organization?
Remember, data is everything and everywhere. It's in emails, registrations, donations, your CRM, your website, and email marketing systems, not to mention the custom workflows, spreadsheets, and surveys that your organization maintains. So one of the first steps is to figure out which sources of data you are currently tapping and which ones you are not.
When you’ve done this, there are innumerable technologies that promise to help you collect, organize, and make sense of your data -- mostly revolving around your website, web services, and in-house software, and the underlying database software involved in each. While these tools can be very powerful, many restrict access to your data, offering you only the ability to create reports. Ideally, you want unrestricted access, because this enables you to do analyses that are richer and more insightful. To see if this is possible, talk to the people who help you set up these systems in the first place.
Once you’ve got the data in your hands, what do you do with it? If your data is clean, Excel actually is a pretty good place to start. (You can even do some relational data analyzing with Excel’s pivot table functionality.) And you can go deeper into the data by learning SQL.
You can also set up a professional analytics package with services such as Pentaho and IBM Cognos Analytics. These packages will set up data architecture for you along with dashboards, reports, and systems for sharing access. However, these are expensive options that will be beyond the budgets of most small and medium-sized nonprofits.
And finally, you can create your own databases. There are many options out there, but if you’re looking for something free and open-source, consider Formulize, a platform that Julian helped develop himself.
Ultimately, developing the systems and culture necessary for data-driven decision making is a journey that requires intention, time, and energy. Julian’s advice is to dive right in! The more you experiment, the more you’ll learn.