Examining a client’s database, whether in preparation for a data conversion or to plan an in-situ data cleanup, is a little like an archeological dig. When we plan a conversion to or from the Blackbaud® software The Raiser’s Edge® , for example, we dig down through layers of historical strata to prepare for mapping. This work can be very revealing as we work together to interpret the meanings of various records, create maps of where important clues may be, and wade through decades of an organization’s informational history.
In order to do this with our clients, we have developed tools that help us unearth and find patterns in the layers of information. Let me share some of my trade secrets.
One of the most useful reports that JCA has developed to assist with system analysis and data conversion and cleanup is a Code Frequency Report. This report, in its simplest form, provides three pieces of information:
- A list of the tables in a client’s database with their record counts
- A list of each field in each table with record counts where that field is populated
- A list of distinct “coded” values, descriptions and totals for each coded field
This report is essential to our data conversion process. It has always been our desire to expand the Code Frequency Report to include other statistical information like minimum and maximum values for numeric and date fields and maximum length of data in text fields. Now Microsoft has provided a new tool in SQL Server 2008 that allows us to easily gather this information and a lot more. The tool is the “Data Profiler Task” in SQL Server Integration Services. It can be used to mine very useful database metadata from any SQL Server database. Some of the data extracted include:
- Candidate Key Profiles – evaluation of a table to determine the column holding the unique key value
- Column Length Distribution Profile – evaluation of a string column to determine the unique string lengths and the percentage distribution of those lengths
- Column Null Ratio Profile – evaluation of a column to determine the percentage of rows where it has no value (is null)
- Column Pattern Profile – evaluation of a string column to determine the patterns of its data; this is reported as a regular expression
- Column Statistics Profile – evaluates numeric columns for minimum, maximum, average and standard deviation values; evaluates datetime columns for minimum and maximum values
- Column Value Distribution Profile – evaluates columns to determine distinct values and their percentage of distribution
The Data Profiler Task is easy to set up to analyze a full database and can be customized to perform particular profiles on select tables. This means that when we need to investigate a custom, homegrown database that we have never seen before, we can get all of these statistics in a few easy steps. And when evaluating a Raiser’s Edge or Tessitura system, we can limit the output to the specific tables and fields of interest.
The output of the SSIS task is an XML file. The file can be viewed using the appropriately named Data Profile Viewer (found in the \Program Files\Microsoft SQL Server\100\DTS\Binn directory). The most awesome function of the Data Profile Viewer is the ability to see the records behind the statistics. When reviewing metadata information you can easy view all of the records in the table corresponding to the metadata. For example, in this screenshot the grid in the bottom right corner is showing the gift records with a “Monthly” Installment Frequency (GFInsFreq):
John Jackovich is the CIO of Jacobson Consulting Applications, Inc. (JCA)
JCA provides strategic consulting to the world’s leading nonprofits.

