The ultimate Data Glossary


Big data sounds like new landscape to you? Good place, Guys! Our Data Glossary is cooked for everyone: we target some big data keys words explanation to understand easily data’s world! 

The ultimate Big Data Glossary 2020/07/06 | Data Culture



Systematic computational analysis of data or statistics. It is mainly used for the discovery, interpretation, and communication of meaningful trends and patterns in data. Applied to business data, analytics helps companies to describe, predict, and improve business performance



In the context of IT, refers to the ability of a user to access information or resources in a specified location and in the correct format


Big data

Large volume of data. You can find big data come from different format such as : text, images, audio, video. Get more details in « what’s big data ? »


Big data landscape

Refers to an organization’s overall data storage options, processing capabilities, analytics, and applications present in its data environment. This also a chart of Big Data ecosystem made by Matt Turck to which helps to describe the state of Art in BigData technologies, years after years



A chart is a graphical representation of data, in which data is represented by symbols, such as bars in a bar chart, lines in a line chart, or slices in a pie chart. Charts are often used to ease understanding of large quantities of data and the relationships between parts of the data. Charts can usually be read more quickly than the raw data.


Data discovery

Process of extracting actionable patterns from data. The extraction is generally performed by humans or, in certain cases, by artificial intelligence systems. The data presented is typically in a visual format and may look like a dashboard, depending on how it is presented in the application


Data preparation

The act of manipulating (or pre-processing) raw data (which may come from disparate data sources) into a form that can readily and accurately be analysed. It frequently refers to a bunch of tools that help people deal with this kind of tasks.



Consists in communicating figures or raw information by transforming into visual objects: points, bars, curves, maps…


Date format

When you specify a date format, you are specifying which components in the source data represent day, month, year, hour, minute, seconds. In Analytics, there are many date formatting conventions in use, often as many as tools which help store, process, extract data. This is a common “pain in the ass” of every data projects


Deap Learning

Set of machine learning methods attempting to model with a high level of data abstraction



“Data Protection Officer” is the person in charge of the protection of personal data in public or private organizations


Drop to Kibana

Ingests your large raw files and makes the data come alive. Precise searches by keywords, full-text approach as much as dynamic visualisations and dashboards are at your fingertips!


Elasticsearch Kibana

Analytics and Data visualization stack which provides visualization functions on indexed content



General Data Protection Regulation. The GDPR aims primarily to give control to individuals over their personal data. Since the 25th of may 2018, GDPR applies to everybody inside EU. In a nutshell, we could say “Less is More”, less PII you have, more efficient in their protection you are 😉



Location from space of objects located on the surface of the Earth


Larges files

Data sets that are too large (Gigabytes or Terabytes volumes) with by traditional data-processing application software. It’s commonly stored in simple file format such as csv, txt or hierarchical format such as xml or json.



Full written record of a journey, a period of time, or an event. In data, a log file is a file that records either events that occur in an operating system or other software runs. Logging is the act of keeping a log. In the simplest case, messages are written to a single log file.


Open Data

Open Data is data that’s available to everyone to access, use and share. Governments, administrations and companies open more and more their data to enhance their usage by citizen, enterprises and develop new insights or apps


Open source

Denoting software for which the original source code is made freely available and may be redistributed and modified


Personal Data Tracker

Detects and sorts your sensitive data (names, first names, addresses, telephone numbers, bank details, health ..). It is an “automated” inventory to ease your compliance with the GDPR (or RGPD in “French”-> yes we are) requirements…,



Personally Identifiable Information is any information relating to identifying a person. You get mire detail in “Personal Data, tell me more!



Refers to the ability of a product to adapt to a change in order of magnitude of demand, in particular its ability to maintain its functionality and performance in the event of high demand



Is a set of cybersecurity strategies that prevents unauthorized access to organizational assets such as computers, networks, and data. It maintains the integrity and confidentiality of sensitive information, blocking the access of sophisticated hackers.



To keep it simple, structured data is highly-organized and formatted in a way so it’s easily extracted or ingested in relational databases.



A table is a data structure that organizes information into rows and columns. It can be used to both store and display data in a structured format. For example, databases store data in tables so that information can be quickly accessed from specific rows.



Basically, unstructured data refers to information that either does not have a pre-defined data model or not organized in a pre-defined manner.



One “V” of Big Data 3Vs, refers to both structured and unstructured data generated either by humans or by machines. However, unstructured data like emails, voicemails, hand-written text, audio recordings etc, are also important elements under Variety and big data technologies make the difference to make them talk compared to traditional data processing applications.



One “V” of Big Data 3Vs, regarding the speed with which data are being generated or processed. As examples, every day 900 million photos are uploaded on Facebook, 500 million tweets posted on Twitter, 0.4 million hours of video uploaded on Youtube and 3.5 billion searches performed in Google. Big Data accepts these incoming flow of data and at the same time process it fast so that it does not create bottlenecks.



One “V” of Big Data 3Vs regarding the large amount of data. Have in mind that Data in the digital universe doubles in size every 2 years… Get more on What’s Big Data?

More questions or suggestions? Don’t mind to post your idea on Community forum

We love learn and share!