A Brief History of Analytics

The following are some of the significant events in analytics history, along with some observations about them.

1884

Herman Hollerith filed a US patent application titled Art of Compiling Statistics describing a device for automatically detecting data stored on paper tape and compiling statistics from it. Statistics in this sense involved counting--accumulating--the occurrences of holes, or their absence, in specific positions on the tape, and compiling totals for the various items the positions coded for.
A very important point is that statistics includes the fundamentals of arithmetic: counting, summing, averaging, etc.; it's not solely the province of advanced technical statistics that modern usage generally implies.

1889

January 8 – Hollerith was issued U.S. Patent 395,782  for a revised form of his invention using punch cards due to their advantages over paper tape. Interesting note: the punch cards were sized to enable the use of equipment used to handle US currency, explaining the familiar—to computing veterans—standard 80 columns for data and display tech.

1890

USA Census taken. Hollerith’s device was successfully used in analyzing the data collected during the Census, reducing the time required to compile the statistics from up to seven years to as few as three months.

See here for a concise history of Hollerith and his tabulating machine.

1896

The Tabulating Machine Company was founded by Hollerith

1911

Computing-Tabulating-Recording Company (CTR), a holding company was formed by the amalgamation of The Tabulating Machine Company and three other companies.

1914

Thomas J. Watson was hired by CTR from National Cash Register.

1924

CTR renamed to International Business Machines (IBM).

1945

Vannevar Bush published As We May Think, describing the essential characteristics of interconnected bodies of information, essentially anticipating the creating of the World Wide Web.  https://en.wikipedia.org/wiki/As_We_May_Think.

Electronic Numerical Integrator and Computer (ENIAC) the first programmable, electronic, general-purpose digital computer first put to work for practical purposes on December 10. Developed for military use in WWII, ENIAC generated great excitement and anticipation in military, technical, and business circles

1950s

Computers become increasingly used in businesses, across the full range of business functions, with financial systems at the forefront--financial record-keeping and management were well-understood disciplines with rigorous processes that were well suited to automation. Mainframe computers were enormously expensive and technically challenging to operate.
Data management and analysis/reporting is time consuming and expensive, requiring deep technical experience in computer programming with technical languages.

1959

COBOL - COmmon Business Oriented Language - is born; expressed in English, designed to be a higher-level, more productive way to generate reports. Data analysis/reporting remains a technical endeavour, albeit more productive.
COBOL remains in use, usually in legacy systems.

1960s

Business adoption of computers accelerates, with mainframes accessible through time sharing for those unable to afford their enormous costs. Business data is largely kept in hierarchical databases representing things like charts of accounts that were among the first business entities managed with computers; COBOL remains the primary vehicle for generating reports.

1970

RAMIS invented - a group of COBOL report writers recognized the common structural elements involved in creating reports were suitable targets for abstraction, created a fourth-generation English-based non-procedural reporting language making it simpler, easier, faster, and more transparent to create reports.

Relational Database Design Appears

E. F. Codd published A Relational Model of Data for Large Shared Data Banks, introducing relational database design. Among its major accomplishments relational design:

  • ensures that when properly modelled and managed by a DBMS relational data is protected from data anomalies--this relieves to some degree, applications from understanding the data's structural relationships and guarding against introducing anomalies; and
  • makes the data undecipherable to people who don't possess both information domain and relational database design expertise--the first is require to understand the data and its relationships, the second is required to map domain knowledge onto the database schema.

The adoption and dominance of relational database design was instrumental in moving data analysis away from nontechnical people who needed the data's information into the embrace of technocrats who were able to access and process the information, normally via writing SQL queries followed by processing/rendering the resultsets returned by the queries.

1974

SQL - Structured Query Language, introduced for accessing data in System R, a relational DBMS developed at the IBM San Jose Research center, to implement Codd's relational theory.

mid-1970s

QBE - Query By Example introduced at IBM Research. QBE pioneered the use of visual elements to represent database operations; although it was successful in this is fell short of modelling the human/nontechnical aspects of analysis.

1975

FOCUS is released by the inventors of RAMIS as a product by Information Builders, Inc. (IBI). FOCUS became the most successful 4GL; during the late 1970s and 1980s it was the leading reporting product, making it possible for report writers to deliver results faster and more effectively, and for business people to do their own reporting, freed from the time, energy, cost, and effort involved in obtaining technical/programming assistance. At its peak during the mid-80s to early-90s IBI was one of the world’s leading software companies.

 

FOCUS example - this code
TABLE FILE OFFICESUPPLIES
  SUM Sales, Profit
  BY Region
END
generates this report
Region        Sales    Profit 
Central  $2,540,342  $519,826 
East     $2,422,805  $377,566 
South    $1,597,346  $104,201 
West     $2,391,439  $310,849

  

FOCUS was created in a time when hierarchical databases were the norm, one feature of which was that the structure of the data was able to mirror the structure of hierarchical information common in business, e.g. charts of accounts and the like. FOCUS was able to understand that structure and produce field aggregations appropriate to the context—this is a distinct advantage over relational DBMSs and tools that need the analyst/report writer to recognize and accommodate the information structure in calculating values.

1979

Relational Software, now Oracle Corp, released its first commercial relational database system.
Other notable products include DB2, Ingres, Sybase, and Informix.

1980

dBase released by Ashton-Tate for CP/M, later ported to DOS-based PCs.

Tim Berners-Lee, while at CERN in Geneva, proposed an interconnected knowledge system, a form of hypertext, and created a demo program called ENQUIRE.

1980s

Business applications with tabular data become commonplace as individuals create their own applications with spreadsheets, form-based database applications linking user interface forms to tabular data, and commercial applications appear.

Relational databases become the norm for holding business data, leading to increasingly complex, and to the uninitiated undecipherable, enterprise database schemas, and to the need for specialized technical help to access the data as relational systems do not map to normal human information models.

Decision Support Systems (DSSs) became established in the business world as viable mechanisms for making data-based information available to decision makers. Various DSS flavours and incarnations appeared, usually with a bias towards addressing the needs of senior personnel, e.g. Executive Information Systems (EIS).

1983

The Visual Display of Quantitative Information first edition published by Edward R. Tufte.

Tufte introduced and popularized the importance of employing highly effective data visualizations that communicate the essential information contained in the data along, along with stressing the importance of avoiding anything that impedes the effective perception and interpretation of that information. Among his principles, visualizations should:

  • show the data
  • induce the viewer to think about the substance rather than about methodology, graphic design, the technology of graphic production, or something else
  • avoid distorting what the data have to say
  • make large data sets coherent
  • encourage the eye to compare different pieces of data

Tufte invented sparklines—small, word-sized graphic data elements suitable for including in text.
He also coined the phrase chartjunk to describe decorative elements that contribute nothing to conveying the information and by their mere presence impede its comprehension.

1988

Donald Norman published The Psychology of Everyday Things, renamed The Design of Everyday Things in later editions—available online here.

Norman's contribution to analytics is in his advocacy that tools should be designed to be used by people for the people's use and benefit,  and in his identification of the ways in which tools can be designed to accomplish this. Major points:

  • people should not need to accommodate themselves to tools, tools should match people's cognitive and intellectual abilities;
  • tools should provide affordances—clear and obvious mechanisms—that make it obvious what the tool can do and how to do it;
  • tools should shield people from technical aspects that are not germane to the person's needs and purpose.

1980s

1990s

Data Warehousing-based Business Intelligence (DWBI) becomes the dominant paradigm for analyzing business data, leading to the widespread, virtually unchallenged adoption of the idea that data analysis is a heavyweight, high ceremony activity that requires substantial amounts of investments in time and money to build the enterprise-scale industrial strength databases and analytical platforms that would be the solution to people’s information needs..

DWBI abandons and disregards the value in analyzing data “in the wild” in pursuit of a single version of the truth; this neglect hampers the realization of the largest part of an organization’s data assets’ value - the information it contains that could benefit people in the local context that do not need the whole-enterprise perspective, and for whom data conformed to enterprise standards is often mismatched for their purposes.

December 1989 - January 1990

Tim Berners-Lee has coined the term World Wide Web to describe the networked hypertext-based computer information structure and the first computers outside CERN were connected.

1996

Spotfire launched, based on work done at the University of Maryland’s Human-Computer Interaction Lab. Spotfire adopted lessons learned from HCI in mapping computer-provided application elements to human cognitive and intellectual abilities, inverting prevailing paradigm holding that people adapted to the demands of the technology in order to accomplish the basic data-analytical activities.

QlikView created, introducing an alternative model of analysis based upon the utility in automatically relating associated data tables and providing effective human-oriented analysis mechanisms.

2003

Tableau Software founded to commercialize the work done on the Polaris data visualization project at Stanford University during 1999-2002. Polaris was designed to make it easy to create visualizations of data by providing user interface objects for data elements and the structural components needed for the visualizations, with clear relationships between user actions and the composition of the appropriate visualization. Tableau took these principles and extended them into a commercially available, user-centric data visualization tool.

2004

Tableau 1.0 released. People can easily and simply analyze their data without technical intervention.
Tableau advanced the conceptual work begun with 4GLs into the realm of GUI-based highly responsive, highly effective model of human-computer interaction that provided a new way for data analysis to take place, bringing data analysis out from under the skirts of technologists so that ordinary people could see and understand the data that matters to them.
From the perspective of Donald Norman's design philosophy, Tableau implemented affordances that presented data and the fundamental analytical operations as first class UI objects, making it possible to directly and immediately have Tableau present data visualizations in response to user actions. This was a revolutionary change in analytics, making it possible for nontechnical people to see and understand their data in ways that matched their cognitive and intellectual abilities immediately, effectively, and efficiently.

2009

NOSQL Meetup

This meetup is about "open source, distributed, non relational databases".

Have you run into limitations with traditional relational databases? Don't mind trading a query language for scalability? Or perhaps you just like shiny new things to try out? Either way this meetup is for you.

In response to the limitations of relational databases, people actively seek alternatives.

2015

Graph Databases have emerged as superior alternatives to relational as they are able to model many real world scenarios that are impossible, impractical, inefficient, or otherwise difficult to model relationally.

From Neo4j, a leading graph database company: Impossible Is Nothing: The History (& Future) of Graph Data