Visualization plays a key role in the topic of Big Data. In the majority of projects, the visualization of the result is at the end of a data-driven process. Data must be presented in a way that is understandable and comprehensible in order to ultimately support strategic and operational decisions by management. In this article, we would like to introduce tools that
- for the creation of visually appealing , one-off or recurring reports (such as sales figures by region on a daily basis, audience distribution and market shares from yesterday’s broadcast day or forecasts for future business development)
- for visualization in exploratory data analysis (eg to clarify a situation. Clustering words in a text – for example to create a visual summary as a word cloud or graphical output (plotting) in the ad hoc analysis of a (SQL, CSV) data source ) and
- suitable for the display of (live) charts within web applications (dashboards, thus business-critical key figures – such as the percentage of rejects at the end of a production line or detailed analyzes of website traffic with software such as Google Analytics).
Below we will introduce some tools that are suitable for all three use cases.
Basic tools for data visualization
Below we will look at software packages that can be used by analysts, as no in-depth technical knowledge is required. Data is loaded into the respective tools via defined interfaces (SQL, REST, CSV,…) and graphically prepared there.
MS Excel and LibreOffice
The most common tools for data collection and visualization are often pre-installed on company PCs. LibreOffice is also available and can be used on Linux, MacOS and Windows operating systems. Another, often decisive advantage is that people are often already familiar with these tools and therefore there is only a very flat learning curve for using this form of visualization.
The tools are often suitable in cases where
- a sufficient result must be achieved quickly
- the amount of data is manageable (up to a few megabytes) and
- the input data is in standardized formats (CSV, TSV).
The videos How to create charts using Libre Office Calc and How to make a line graph in Excel (Scientific data) show in just a few minutes how charts can be created using the respective standard tools.
Table
Tableau is a very well-known tool for data visualization. The software’s strengths lie in the versatile forms of representation (charts, graphs, maps and much more) for visualization and the flexible connection of a wide variety of data sources (CSV, SQL databases, Hadoop, SAP, Teradata,…). Due to its widespread use, there are also many (online) courses and learning materials for both beginners and advanced Tableau users. Although the use of special functions requires expert knowledge, getting started with the tool is very intuitive. This leads to a moderately steep learning curve, with a sense of achievement quickly emerging right from the start.
Thanks to its versatility and relatively easy handling, Tableau can be used to quickly create one-off ad-hoc and recurring analyses. Furthermore, in the data view, you can use interactive “ drill through “ to quickly click through from the analysis of a high aggregation level (such as sales at country level) to the lower levels (sales of a single market on a Saturday).
Another advantage is the ability to install a desktop version of Tableau on a PC or use the web version directly in the browser . The browser version in particular can be used by managers on the go to display analyzes on mobile devices in order to monitor all important key figures.
To get started with using Tableau, watch the video Tableau Getting Started .
Microsoft Power BI
Power BI is Microsoft’s business intelligence solution. It also offers a wide range of display options that can be put together using a drag & drop interface. An outstanding feature of this tool is its good integration with other Microsoft products such as Excel or SharePoint. In Sharepoint, different users can work on a report and thus cooperate across departments and locations.
Power BI also offers web interfaces that are hosted in the cloud and are therefore available on all devices with web browsers . Power BI also requires a paid license.
An introduction to Power BI is provided by the training video Power BI Tutorial For Beginners | Introduction to Power BI | Power BI Training | Edureka .
Data visualization for developers
Now we would like to introduce visualization options that can be used by software developers or specialists with in-depth technical knowledge . These tools can be used generically for a variety of use cases , but they require the data to be in a suitable form (usually programmatically prepared).
Exploratory data analysis
The first step in a data-driven project involves analyzing existing inventory data: What data is already collected? What is the quality of it? What patterns are there in the stored data? What is the distribution of events (return rate, reject rate, …), customer characteristics (gender, age, …) or needs (products ordered at certain times)?
These questions are often evaluated very „data-oriented“ and collected in development environments.
Jupyter Notebook + matplotlib
Jupyter Notebook is often a central tool for data engineers and data scientists to explore new data sources, perform cleaning and aggregation on them and visualize the data for a better understanding . In this case, the visualization is handled by the „matplotlib“ library, which can draw graphics directly in the notebooks. Examples of matplotlib graphics can be found here . Jupyter and matplotlib can be used independently of each other, but together they form a very frequently used and coordinated team. Since both components are free, there are no license costs.
A version can be tested in the browser on the website jupyter.org .
For an introduction to creating plots within Jupyter, see the video Learn Jupyter Notebooks (Pt. 1) Plotting .
Apache Zeppelin
The Zeppelin tool is very similar to Jupyter Notebook in terms of its purpose. It is also an interpreter in which live code can be executed and graphics can be drawn. Data can be researched, prepared and loaded in table form for visual representation. Zeppelin is also a tool for developers or data scientists to quickly get an overview or to graphically represent facts ad hoc. The software is open source and freely available.
In the video Zeppelin Build and Tutorial Notebook, graphics are generated from the imported data starting at 10:16.
Apache Zeppelin features are described in more detail here .
Gephi
Gephi is ideal for visualizing graph structures . It can be used freely as it is an open source tool.
Graphs can be loaded into the data editor. One file – in CSV format, for example – contains the vertices and another contains the edges – the connections between the nodes. Gephi then draws a graph that can then be explored interactively (zoom in/out, move nodes, etc.) . Clustering of properties using specific colors or the thickness of the lines by weighting the edges as well as many other graphical highlights are possible . Practical examples of how Gephi can be used include the visualization of subject areas in journalism (e.g. by the New York Times) or network topologies in the technical field.
Due to the open source nature, many graphical extensions have been implemented for Gephi. So if you need to visualize a graph structure, Gephi is usually the tool of choice .
The visualization options of Gephi can be viewed in the linked video. Introduction to GEPHI shows in great detail how data can be loaded and visualizations can be used.
Display on dashboards or web applications
In the same way that Google Analytics, for example, allows a live insight into the web traffic of a website, managers need this live insight into their own company’s business processes in order to make valid decisions quickly . To make this possible, companies often implement their own tools, usually as web applications . These continuously visualize business-critical key figures and thus make it possible to always keep an overall overview .
Sales staff are provided with information on inventory levels or current product price lists. Marketing continuously checks the effectiveness of the online advertising campaigns booked. IT administrators keep an eye on the error rate of internal and external applications and developers continuously receive information about the quality of the source code developed. All of these metrics are often displayed in web applications and we would therefore like to introduce tools that make visualization within the browser very easy.
D3.js
D3.js is a JavaScript library for visualizing data in the web browser . By using modern advertising standards , the charts can be updated live when the underlying data changes. The transition to a new state is also carried out with an animation. This helps to understand the message of the chart.
The open BSD license allows the software to be used for all commercial and non-commercial purposes.
There are many different chart types available, which can be found on the example page .
A recommended video showing the possibilities of D3.js can be viewed here:
D3.js Tutorial – A Demo with Examples using D3.js
chart.js
Chart.js can be seen as a stripped down version of D3.js. It offers fewer charts, but is a little easier to integrate. An overview of all charts can be found on the example page . For simple applications, this library is often sufficient. The charts also automatically adapt to the available screen space and therefore always have an ideal display on desktop PCs and mobile devices. Chart.js also supports live manipulation of the underlying data and thus offers real-time capable charts.
The integration and use is described in the video Easily create and customize diagrams with JavaScript and Chart.JS! [TUTORIAL] .
We hope we have been able to give you a comprehensive impression of basic tools for data visualization in the Big Data environment. The list is not exhaustive and the ecosystem is developing very quickly. We are excited to hear about your experiences and which tools you use. Please send us your recipes for success, questions or other suggestions by email to [email protected]
We look forward to your suggestions!