Over the course of decades of experience with computer-assisted data analysis—analytics—beginning in the 1970s with COBOL and punch cards, through the 1980s and 90s with 4GLs, enduring the age of BI, and continuing with the current emergence of data science and its cousins as the popular go-to analytics solutions, some principles and practices have proven to be useful and valuable in helping people benefit from the information contained in the data that's relevant to them.
"Analystics" — a portmanteau of 'analytics' and 'heuristics' — are the product of this experience in data analysis for business clients. They are rules of thumb, principles, and practices that have proven to be good and valuable, and those that should be at best approached with caution. In this form you can think of them as fortune cookies – expandable for a synopsis and link to a full description when available. They are offered here for your benefit.
The big ideas:
People should be able to understand the data that matters to them.
Data analysis is first and foremost a cognitive, intellectual activity.
People, Data, and Analytics
Analysis is the active contemplation of information in order to understand it in its particulars, and to synthesize higher-level truths from it.
Data is Information encoded in persistent media. As such it requires cognition in order for the observer/analyst to be able to decipher the information from the encoding.
People know what they need to know from their data, and are capable of conducting much of their own data analysis, given the opportunity—tools, and training appropriate for their interest, skills, and abilities.
Take advantage of this by helping them find the information they desire, and learn how to analyze the data for themselves, to the limit of their abilities.
When supporting clients, it's easy to make them happy by helping them analyze and understand their data quickly, often on the spot.
Once this is done it's easier to keep them happy by continuing to help them achieve good results, than it is to make them wait weeks (or even months) before delivering something and hoping they'll be happy with it.
There's always some data readily at hand that matters to someone.
Help them find some interesting, useful, and hopefully valuable information in it. This can be done immediately, so do it.
Continue assisting people in finding interesting and valuable information information and insights in their data, and in sharing these with others.
Help them improve and extend their own data-analytical and analytics-sharing abilities.
Repeat at short, regular intervals.
The entire point of analytics is to make the information content of data available to people as quickly, effectively, and efficiently as possible. Doing so allows people to reason about the information.
Analytics is multi-scaled in that there is utility and value in analyzing data in all of its manifestation across the enterprise.
At the base is data's atomic form, the original form it's captured in, usually in a source system, received from an external source, etc.
At the top is the fully consolidated, conformed, homogenized, centrally stored Enterprise Data Store--previously Data Warehouses, increasingly Data Lakes, The Cloud, and other post-BI repositories.
In between is an entire universe of data in the various stores, repositories, databases, and other forms used to stage the data as it's processed from its atoms into the Enterprise forms.
Conventional BI, and now Conventional Analytics is largely, and in the naive approach exclusively, oriented towards the idea that the Enterprise Data Store data is the data that's the proper objective for analysis.
People seeking to understand the information content of their data should not need to care about the technology in use.
The less visible the technology is, the less the person has to accommodate themselves to the technology.
Ideal data-analytical tools and technologies would be invisible; they would seem to be part of the analyst's innate mentality.
Superior data analysis tools and technologies match the cognitive, intellectual, and analytical models of the people using them, supporting and augmenting human abilities with a minimum of fuss and bother.
The best tools and technologies impose themselves as little as possible, creating minimal cognitive and intellectual friction.
However, far, far too often analytics is framed using a technology-centric paradigm holding that the application of technology--often and preferably the larger, more complex, and more expensive the better--is the central, essential of activity. This is worse than unfortunate; it's tragic and has been a central theme for the colossal failure of technology-centric, heavyweight, high ceremony conventional BI initiatives.
Freeing analytics from the techno-centric framing, returning it to one putting people and their information needs front and centre, is an effective way to deliver on analytics' core promise.
But data analysis requires tools, so...
Simple, easy tools are better than complex, functionally deep tools for the vast majority of data analysis needs.
Curiosity is the beating heart of analysis. Wondering what the data can tell you is the best way to find out.
Fear is the paralytic killer of analytical efforts and initiatives, even the well-intended ones.
Like nightmare shadows, fear can be faced, challenged, and dissipated by shining the light analytics brings on it.
Virtually everything in electronic form is data, and can be analyzed.
The traditional concept of data subject to analysis, i.e. content in tables, is far narrower than it should be. As long as there is some organizing principle or scheme—which includes most common web technologies and protocols—the content can be analyzed.
Virtually everything in electronic form is data, and can be analyzed.
The traditional concept of data subject to analysis, i.e. content in tables, is far narrower than it should be. As long as there is some organizing principle or scheme—which includes most common web technologies and protocols—the content can be analyzed.
Data can come from a multitude of sources. It's quite common for important data to be spreadsheets, CSV files, and other forms not considered 'real' databases. Very often, this data is useful, meaningful, and valuable to someone and therefore should be subject to analysis for their benefit.
Conventional analytics and BI consider that analyzable data is sourced from formal analytical databases, after having been harvested from its sources, cleansed, and integrated into a "properly" designed analytical model.
This framing is unfortunate, unnecessarily and tragically limiting the universe of analyzable data to a very small fraction of everything that can be effectively analyzed.
Also: Dimensional modelling has been a curse and a pox upon human-oriented data analysis.
Flexible, inquisitive analysis often leads to unexpected insights, even with data that's well understood.
Allow serendipity to work its magic.
This is a surprise to many people, but '17' is in reality nothing more than the visual representation of a specific quantity in decimal notation.
We are so accustomed to this form that this aspect of its nature is invisible to many of us.
Here's another representation of the same quantity: '.................' and another: 10001
The best way to be successful in the long term is to practice being successful. Start with simple successes and build upon them.
The best way to be successful in the long term is to practice being successful. Start with simple successes and build upon them.
Governance should be based upon trusting people to do right, reasonable things.
It does require verification, however.
In fact, Governance cannot be conducted unless and until verification is possible, so there's a huge advantage in leveraging it to support trusting people to do the right things.
The Single Value of the Truth is phantom and a mirage used to instill a fear of nonconformity.
This false fear is the basis of much of what's wrong with conventional BI, and has been used as the justification for spending ever-increasing amounts of time, money, energy, attention, and other resources on building enormous, elaborate, enterprise-spanning, industrial-strength, answer and-any-all questions universal data repositories and report factories. Effort that should have been spent in helping people understand the information in the data that matters to them.
The vast bulk of the mountain supports the peak; without this mass the peak does not exist.
Conventional BI is oriented towards servicing the peak's information needs, disregarding those beneath.
A common fallacy in traditional and contemporary approaches to analytics is the assumption of primacy, even to the point of exclusivity, of the executive suite members' information needs.
Everyone has data that could provide useful and valuable information when analyzed. Helping everyone understand the information in their data is our calling.
Fundamental analytics are the heart, mind, and soul of analytics. Without the fundamentals in place analytics is a barren information desert at best punctuated by the occasional oasis. Fundamental analytics:
- make analysis of data at first contact possible,
- span the entire universe of data-centric processes, activities, and operations, and
- support the delivery of refined analyses to consumers
Visualizing data is necessary in order to understand it.
This seems like it's too simple a truth to be usefully said, but...
One of the most common failings of conventional BI, and in its legacy in analytics, is the framing that data visualization is something done to prepare data for publication in the form of charts, graphs, and other forms in order to communicate a particular set of conclusions to some audience.
This is misguided to the point of irresponsibility in its stunted, narrow view of visualization's much deeper, broader, and richer value in helping understand data.
An organization's work profits when people have access to and can analyze the data that matters to them. In practice, many or most times this means data that comes from the business systems the people use and/or directly affect them.
Supporting these people with analytics that works is hugely beneficial. Making them wait for IT/BI to build them reports, dashboards, etc. is not.
Much of traditional BI/A is based in the fear of the unknown.
This results in the imposition of strict controls on everything that happens, to the active impediment of real, valuable things getting done.
Information, and the transparency it provides, provides knowledge, and therefore offers the opportunity to be unafraid. But it takes courage to unshackle oneself from the constraints of the past.
Analytics exists at all scales, from the local and intimate to the global and integrated.
Most people can benefit from the information relevant to their interests first and foremost.
Enterprise-scale information needs are primary for executives, far less significant to workers just doing their jobs.
Achieving competence with the fundamentals of analytics is the foundation upon which all of effective analytics is based.
Without a strong foundation there's no hope that analytics will succeed in helping people understand the information contained in the data that matters to them.
Data is information. It's only information, and is valuable as it is, in its context.
People who say "the data is bad" really only mean "the data isn't the way I want it to be." Implicit in their claim is that the data isn't suitable for some particular purpose. If the information isn't suitable for a particular purpose, adjustments can be made.
A common refrain vis-a-vis data analysis: "First, we clean up the data so it's fit for analysis."
This is an extremely limited, misguided, and harmful way of thinking about analytics.
In truth, the concept of wrangling data is void of meaning unless and until the data has been analyzed to the point at which some benefit to altering it has been established.
Wrangling data involves altering it, from the simplest to the most complex operations. In order to be effective and trusted, the effects of wrangling need to be analyzed in order to ensure that the wrangling processes have done what they need to, and done only what they need to.
Adding instrumentation to data wrangling processes makes it possible to analyze the processes' activities, ensuring transparency and providing the ability to validate the processes' accuracy, effectiveness, even efficiency.
<! --
Proponents of a wrangling-first model of analytics miss the point that any time data is manipulated it needs to be analyzed so that the wrangling hasn't had adverse or unexpected effects.
-->
Achieving results, however simple, now is infinitely better than postponing the delivery of complex, elaborate results, however polished and elegant.
A common refrain in conventional BI, and increasingly in data science/technocracy is: "First, we clean up the data so it's fit for analysis." This is an extremely limited, misguided, and harmful way of thinking about analytics.
In truth, the concept of wrangling data is void of meaning unless and until the data has been analyzed to the point at which some benefit to altering it has been established.
Analytics that works begins with helping people analyze and understand the data that matters to them, followed closely by sharing their information and insights with others.
After this comes satisfying other needs such as operationalizing the shared content, and ensuring that acces to data and information is provided to, and only to, those who need it and are authorized to have access to it. Further on are the considerations involving the types of interactions are appropriate for the various people who have access.
One premise of conventional analytics, and particularly of the statistics-based predictive analyses increasingly in fashion, is that the data being analyzed is a sample of a larger population. Following from this, analyses are framed as seeking to provide reliable information about the larger population, with a wide variety of statistical methods available to provide assessments of the degree of confidence applicable to population-relevant conclusions.
This paradigm all fine and good for the problem domain it's suited for.
However, for most people, most of the time, the data they need to understand is complete. It is a whole population, containing all the information relevant to the person's interests.
Examples include:
- mostly sales numbers
- shipments handled last week
- weekly call volumes, and the performance of call centre staff
Enterprise
There are many sources of 'expert' advisory information claiming authority in identifying how to develop and execute enterprise analytics strategies, policies, procedures, processes, and operations.
Trusted authorities include Gartner, Forrester, TDWI, and others.
There are serious risks in blindly following these companies' recommendations, mostly stemming from the organizations' regurgitation of large-scale industry trends, largely as reported by their executive customers.
Over the past two-plus decades these companies were leaders in shaping BI into the paradigm it became, and that paradigm was instrumental in causing BI's horrible failures, largely because decision makers followed the media's advice blindly.
Historically, there was a phrase that captured the tendency of executives and managers to follow the leader: "Nobody ever got fired for buying IBM."
Technocrats are attracted to Process because it provides a veneer of control.
One of the consequences of embracing Process is that it relieves the Technocrat of responsibility for outcomes in that one can always say: "We had Industry Best Practices in place, so we were following the experts' advice. Problems are not our doing or responsibility."
Technocrats combine bureaucracy and technology into process structures that have a strong, almost irresistible tendency to adhere into rigid formal frameworks.
Over time, these process frameworks expand, covering more and more of the activities involved in the enterprise's analytics activities.
Things become bureaucratic, with the emphasis on following processes and procedures instead of helping people see, analyze, and understand the data that matters to them.
Data and analytics tools and technologies vendors are not your friends. They do not have your best interests at heart.
They are interested in maximizing their revenue, which means selling you the most expensive technology they can while minimizing their costs of supporting you.
Large scale platforms command higher prices, usually in the form of licenses, than do small scale or personal licenses, and require much less support.
Humans have a strong tendency to add complexity to address challenges.
There are times when this is counterproductive, the added complexity imposes costs that overwhelm the benefits.
‘Less is more’ is a hard insight to act on, it turns out.
It's easy to get enamoured with technology and complexity, and set out to create things that are more complicated than they need to, or should, be.
This is an easy trap to fall into, particularly when the full scope of opportunities and constraints aren't well understood.
A classic example: building dashboards with Tableau that cross the threshold into mini- or full-functional analytical applications. Tableau IS an analytical application, building dashboards to replicate its functionality is severe overkill.
Another classic example: refusing to analyze data extracted from a source system, insisting on only using data that's been 'properly' harvested, conformed, and stored appropriately in the enterprise data repository, be it a warehouse, data lake, the cloud or some such.
SQL is NOT an effective data analysis tool, at least not compared to the multitude of readily available better tools.
Spreadsheets are also not good for data analysis; they are good for storing data, but that's about it.
Analytics--computer-assisted data analysis--requires using computers and software tools. So does developing software applications. However, they are very different things. Software applications need to exhibit correct behaviour, and entire disciplines have evolved to support the efforts it takes to define the behaviour and ensure that the application worked to specification.
Analytics has a different operational model: an analysis of data is only correct when it provides information to someone who finds it to be useful, and correct. It's a gross mistake, with a cascade of unfortunate consequences, to approach and try to conduct Analytics with the same paradigm as if it were application software development.
One consequence of "Analytics is not software development" is that agility in Analytics, although sharing many objectives and principles with agile software development, requires a different operational model. Organizations that try to repurpose agile software development concepts, policies, procedures, and processes will find themselves thrashing about making far less progress than they should be.
Srcum, kanban, and other agile mechanisms, when dropped onto an analytics project, program, initiative, etc. do not provide the benefits usually expected.
To be fair, however, many organizations adopt the trappings of Agile without truly embracing its essential nature, usually with assumption that processes matter most; these organizations are often doomed to cycles of at best sub-optimal results.
Conventional BI is based in the conceit that it's necessary to build data cathedrals in order to conduct meaningful analyses of an organization's data. Like medieval cathedrals, these enterprise-encompassing data warehouse- or cloud-based stores and the analytical platforms fronting them take ages to build.
Modern analytics is capable of profitably analyzing data as found in the wild.
There's a very good reason to not wear old socks.
They're smelly.
There's another reason not to wear old socks.
You'll have to figure it out yourself.