Skip to main content

Nachdem wir im Artikel „Onboarding neuer IT-Mitarbeiter (DevOps, Big Data, Developer)“ die grundlegenden Themen für neue IT-Fachkräfte vorgestellt haben, möchten wir nun anhand eines Tutorials zeigen, wie Wissen zu diesen Themen praktisch angewendet werden kann. Neue IT-Mitarbeiter können die nachfolgenden Tutorialaufgaben Schritt für Schritt durchlaufen und zum Abschluss die Projektdokumentation und die beiden kurzen Präsentationen einem Senior Entwickler vorlegen bzw. präsentieren.

Technische Dokumentation

Zu einem technischen Projekt gehört immer auch eine Dokumentation. Daher sollten einzelne Tätigkeiten während der Durchführung des Tutorials technisch dokumentiert werden. Da in vielen Firmen die Atlassian Werkzeuge „Jira“ bzw. „Confluence“ eingesetzt werden, eignen sich diese Werkzeuge – wenn bereits vorhanden – besonders gut. Im Grunde kann aber jedes Dokumentationssystem verwendet werden. Steht keines zur Verfügung, sollte die Dokumentation im Markdown Standard erfolgen, der auch von Github, Gitlab und ähnlichen Plattformen eingesetzt wird und weit verbreitet ist.

1. Installation von Linux

Im ersten Schritt geht es darum Linux zu installieren. Wenngleich dies direkt auf einem PC oder Server erfolgen kann, empfehlen wir die Verwendung von Virtualbox oder einer ähnlichen Virtualisierungslösung.

Bei der Auswahl der Linux Distributionen sind auf Servern CentOS oder Debian bzw. Ubuntu recht gebräuchlich und sollten für dieses Tutorial ausgewählt werden.

In Virtualbox, the downloaded DVD image from the distribution website can then be linked and the installation can be started directly after the VM boots. The goal of this subtask is to install the selected Linux distribution.

2. Installation of SQL database(s)

After installing the previously selected Linux distribution, you should now install an SQL database such as PostgreSQL, MariaDB or MySQL. This should be done using the package manager of the selected Linux distribution (yum or apt). If desired, you can also select a different SQL database instead of the one suggested above.

For practice purposes, the installation should be done on the command line (BASH) with the appropriate tools.

This subtask is completed when the desired database has been installed and configured, the service is operational and accepts corresponding requests from the SQL command line or software programs. The user and password of the standard database should be noted.

3. Install IDE (Eclipse, Netbeans, Jetbrains)

No IT employee – regardless of their specialization – can do without a powerful editor to edit or write shell scripts, configuration files and source code.

For projects with manageable complexity (e.g. small web applications), simple code editors are often sufficient, such as:

Visual Studio Code
Atom

However, large projects in companies predominantly use many different technologies (SQL database, XHTML/CSS/Javascript frontend, frontend frameworks such as Angular or React, Java backend, backend frameworks such as Jakarta EE or .NET, various shell scripts, etc.) and are therefore better developed in integrated development environments (IDE). Some examples are:

Eclipse
Netbeans
Jetbrains Tools

The aim of the third step is to install a suitable code editor or a suitable development environment on the Linux system. The new IT employee can choose whether to install it from the Linux distribution’s repositories or after downloading it from the respective manufacturer’s website.

4. Check out project from Github

Now the source code versioning tool „GIT“ should be installed – also from the package management of the Linux distribution. Once the installation has been completed, the following exercise project can be „checked out“ or „cloned“ from Github.

https://github.com/openmundi/world.csv

The project can be checked out on the command line using the GIT tool or directly from an integrated development environment that may have been installed previously. Almost all IDEs support GIT as standard because it is so popular. The project consists of sample files in CSV format with country codes.

The following video shows the use of Git:  https://www.youtube.com/watch?v=USjZcfj8yxE

5. Import CSV into SQL database

After the project has been checked out with GIT, the next task is to import one of the CSV files (e.g. countries(249)_alpha3.csv) into the SQL database using a script or program. The programming language used in this step can be tailored to the IT employee’s previous knowledge or to future company or project requirements. It is important that the CSV data structure is retained and that it is imported correctly as a table.

Commonly used and suitable programming languages ​​are Python or Java. The program should first access the database, create the table and then write the data into the generated table.

6. Sort SQL table and output data as JSON file

If a CSV file has been successfully imported into the database, the next step is to sort the data in descending alphabetical order, export it and write it back to the file system as a JSON file. This can be implemented as a second, independent program or as an additional function (e.g. through a parameter when calling) of the first program.

The sorting can be carried out as a standalone intermediate step or as an SQL query before export. The task is completed when a valid JSON file has been output and saved.

7. Install MongoDB or other NoSQL DB

Now MongoDB or alternatively another NoSQL database that has a REST API should be installed. The package sources of the Linux distribution can be used for this.

8. Load JSON file into the NoSQL database

The JSON file should then be loaded into the NoSQL database. You can use a programming language of your choice (like CSV import) or a tool like cURL. If you use a programming language, you should use the library to use the NoSQL database. In the case of cURL, you can use the REST API of the NoSQL database.

9. Access the NoSQL database via REST call

Once the JSON document has been successfully loaded into the NoSQL database, calls to the database’s REST API can be made using cURL on the command line or with the graphical tool Postman  . Queries should be created to filter data and download it (e.g. only get the country codes that contain an „a“), sort data in reverse order and then delete the data from the NoSQL database.

With the completion of this subtask, the technical part of the tutorial is complete.

10. Presentation: TCP/IP – or how the Internet works

A 15-minute presentation (approx. 5-10 slides) should be created on the topic, which explains the basic principles of how TCP/IP and the Internet work (IP addresses, MAC addresses, DNS, network address translation, routers, switches, …). This presentation is given to a member of the development team.

11. Presentation: Cloud Computing vs. on-premise Data Center

A second presentation (approx. 15 minutes, 5-10 slides) should be of a similar format, covering the topics of how cloud and on-premise data centers work. What are the differences, risks and opportunities in each case? This presentation will also be given to a member of the development team.

We hope to have provided new IT employees with an informative tutorial. All technologies and tools used here are highly relevant to the everyday work of almost all developers, DevOps specialists, data engineers and data scientists.

In the third part of the series, we will look at the onboarding of new employees in the area of ​​data engineering. In this section, ETL concepts, OLTP vs. OLAP databases and the Apache Hadoop framework are presented.