Data Analytics Key Terminology

- Advanced Analytics: Automated (or semi-automated) statistical techniques or logic-based methods for analyzing data to discover underlying patterns, make predictions, or generate recommendations (e.g., sentiment analysis, graph analysis, multivariate statistics, machine learning, neural networks).
- Aggregate Data: Data or information combined from multiple observations, series, or sources.
- Agile Analytics: Iterative approach to developing analytics solutions rapidly based on frequent feedback from end-users to increase value for stakeholders.
- Airport Operational Database (AODB): The AODB is the “airport information center” that serves as the central database or repository for all operative systems and provides all flight-related data accurately and efficiently in a real-time environment.
- Analytics: Statistical and mathematical analysis of data to cluster, segment, score, and predict what scenarios are most likely to happen (e.g., data/text mining, visualization, cluster analysis, forecasting).
- Analytics Platform: Full-featured technology solution designed to join different tools and analytics systems together with an engine to execute, a database or repository to store and manage the data, data mining processes, and techniques and mechanisms for obtaining and preparing data that are not stored.
- Application Programming Interface (API): Set of defined rules that enable different applications to communicate with each other.
- Architecture: In reference to computers, software or networks, the overall design of the computing system and the logical and physical interrelationships between its components. The architecture specifies the hardware, software, and access methods and protocols used throughout the system.
- Artificial Intelligence (AI): Automated computational algorithms based on mathematical, statistical, and logic-based techniques, trained on very large amounts of data and used to interpret events, support and automate decisions, and recommend actions based on model outputs (e.g., machine learning, neural networks, deep learning, natural language processing).
- Big Data: High volume, high velocity, and high variety information assets that require cost-effective, innovative forms of information processing to identify insights and inform decision-making.
- Big Data Analytics: Set of procedures, tools, and techniques for analyzing very large datasets to enable enhanced insight, decision-making, and process automation for high value and high veracity.
- Business Analytics: Solutions used to build analysis models and simulations to create scenarios, understand events, and predict future states (e.g., data mining, statistics, predictive analytics).
- Classification Models: Models for evaluating or predicting categorical target variable outcomes.
- Cloud-Based Computing: Scalable and elastic IT-enabled computing capabilities for delivering shared content to multiple end-users simultaneously through internet service technologies.
- Cluster Analysis or Clustering: Statistical techniques for grouping a set of objects, observations, or data that are similar by group (i.e., cluster) to identify natural patterns of grouping in the data.
- Dashboards: Reporting mechanisms that aggregate and visually display data and key performance indicators to end-users as charts and graphs to indicate progress toward pre-defined goals.
- Data Governance: Specification of decision rights and an accountability framework to ensure the appropriate behavior in the valuation, creation, consumption, and control of data and analytics.
- Data Integration: Practices, techniques, and tools for achieving consistent access and delivery of data across subject areas and data structure types to meet requirements of applications and processes.
- Data Lake: Collection of storage instances of various data assets that are stored in a near-exact (or exact) copy of the source format, in addition to the originating data stores.
- Data Literacy: Ability to read, write, and communicate data in context (e.g., data sources and constructs, analytical methods and techniques applied) and describe use-case applications and outcomes.
- Data Mining: A family of procedures and techniques for extracting information and knowledge from large databases and applying the extracted knowledge to make data-based decisions (e.g., clustering, classification, regression, association rules).
- Data Science: A rapidly evolving field that uses a combination of methods and principles from statistics and computer science to work with and draw insights from data (e.g., statistics and machine learning, unsupervised and supervised models, clustering, classification, regression).
- Data Visualization: Procedures, techniques, and tools for exploring and visualizing data in plots and graphs (e.g., boxplots, histograms, bar charts, line graphs, scatterplots, network graphs).
- Data Warehouse: Storage architecture designed to hold data extracted from transaction systems, operational data stores, and external sources. Combines the data in an aggregate, summary form suitable for enterprise-wide data analysis and reporting for predefined business needs.
- Descriptive Analytics: Examination (usually manually performed) of data or content characterized by traditional data visualizations (e.g., pie charts, bar charts, line graphs, tables, generated narratives).
- Digitization: The process of changing an image, information, or process from an analog form to digital format (i.e., encoded numerically with ones and zeros), without substantially altering the content of the original image, document, or process.
- Deep Learning Model: Automated computational algorithms that consist of large multi-layer (artificial) neural networks, used for machine learning tasks with better generalization from small data and better scaling to big data and computer budgets.
- Forecasting: Predictive analytics technique that takes data and predicts the future value for the data by factoring in a variety of inputs and identifying trends.
- Key Performance Indicators (KPIs): High-level metrics or measures of system output, traffic, or other usage, simplified for gathering and review on a weekly, monthly, or quarterly basis.
- Light Detection and Ranging (LIDAR): Remote sensing method that emits and captures light in the form of a pulsed laser to measure ranges (variable distances).
- Machine Learning (ML): Advanced computational algorithms based on mathematical or statistical models, used for both supervised learning tasks (e.g., classification, regression) and unsupervised learning tasks (e.g., clustering, dimension reduction). ML models are trained on a subset of data (training set) and then model performance is tested on previously unseen data (test set).
- Metadata: Information describing the characteristics of data, including structural metadata describing data structures (e.g., data format, syntax, semantics) and descriptive metadata describing data contents (e.g., information security labels).
- Natural Language Processing (NLP): An interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, and technical methods and algorithms for translating text or audio speech into encoded, structured information.
- Neural Network Models: Mathematical or computational algorithms for statistical data processing that convert between complex objects and tokens suitable for conventional data processing.
- Prediction Models: Statistical models for predicting target variable outcomes based on a set of independent predictor variables (e.g., linear regression).
- Predictive Analytics: Advanced analytics procedures or techniques for analyzing content or data to predict characteristics of target variables based on relevant features or attributes of the data (e.g., predictive modeling, regression analysis, multivariate statistics, forecasting, and classification).
- Prescriptive Analytics: Tools, procedures, and techniques for analyzing relationships among variables in order to prescribe a course of action (e.g., heuristics, recommender algorithms, graph analysis).
- Process Automation: The use of software and technologies to automate analytics processes and functions to accomplish defined business goals, such as ingesting, processing, and storing data; automatically updating reporting dashboards; and using automated statistical, machine learning, or artificial intelligence models.
- Relational Database: Type of database that stores and provides access to data points that are related to one another based on the relational model for representing data in tables (e.g., Oracle Database).
- Reinforcement Learning: Area of machine learning concerned with how intelligent agents (artificial intelligence) should take actions to maximize the notion of cumulative reward (e.g., Markov decision process).
- Risk Analytics: Procedures and techniques for evaluating organizational risks quantified as a function of threat, vulnerability, and consequences at each area of the organization.
- Scalability: Measure of a system’s ability to increase or decrease in performance and cost in response to changes in application and system processing demands (e.g., hardware system, database, operating system).
- Sentiment Analysis: Analysis procedure for determining whether text is positive, negative, or neutral.
- Statistical Learning: A framework for machine learning drawing from statistics and functional analysis which includes methods of statistical inference for finding a predictive function based on data. Includes procedures and techniques for supervised learning tasks (e.g., classification, regression) and unsupervised learning tasks (e.g., clustering, principal components analysis).
- Structured Data: Quantitative data represented in a highly organized, standard format that has a well-defined structure, complies to a data model, follows a persistent order, and is easily accessed by humans, programs, and machine learning algorithms.
- Supervised Learning: A machine learning approach where a computational model is trained on human-labeled data to make predictions about a target variable that can be categorical (for classification tasks) or continuous (for regression tasks). The model is evaluated using performance metrics.
- Text Analysis: Process of deriving knowledge from text sources by summarizing text content across a large body of information or documents (e.g., natural language processing, sentiment analysis).
- Throughput: Volume of work or information flowing through a system (e.g., information storage and retrieval systems), in which throughput is measured in units such as number of times accessed per hour.
- Unstructured Data: Qualitative data or information that does not have a predefined data model and that cannot be processed and analyzed by conventional data tools and methods (e.g., text, images).
- Unsupervised Learning: Computational approach used to discover natural groupings or categories among observations in unlabeled data by association (i.e., clustering).
- Video Analytics: Technology for processing digital video signals using algorithms to perform a security-related function (e.g., fixed algorithm analytics, AI learning algorithms, facial recognition).