What is a csv file?

First of all, let’s start with a bit of vocabulary. CSV stands for Coma Separated Values or Character Separated Values. This means that the data are delimited by comma but also semicolons, tabs, quotes. It is a type of text file. In other words, in a spreadsheet like Excel, you can display information arranged in columns and rows.
In this article, we explain what is the structure of a CSV file and how to parse it in Drop to Kibana.

As an example, we use the csv file of the COVID-19 sampling sites available in open data. I already used in my previous article which explains how to get started with Kibana as a beginner.

The first row, or the headers of the csv.

The first row of the CSV file contains the data labels. It’s also called the headers. Therefore, each next rows corresponds to data records available, in our case, the COVID-19 testing sites.
To clarify, commas (or other delimiters) delimit each field, each precise information. For example, you can get various type, such as a number, location or date. To get more on data types, I recommend this video, from 14min42.

Let’s take a closer look with our file:

csv raw file - Octave.io Blog
csv: a raw file

The first row, called the header, shows the data labels. When you look on the website where you can download the file, it is indicated that the labels correspond to:

  • ID : Identifiant
  • id_ej : Finess juridique
  • finess : Finess géographique
  • rs : Raison sociale
  • adresse : Adresse
  • cpl_loc : Complément localisation
  • do_prel : Effectue test RT-PCR
  • do_antigenic : Effectue test antigénique
  • longitude : Longitude
  • latitude : Latitude
  • mod_prel : Modalités de prélèvement
  • public : Publics accueillis
  • horaire : Horaire
  • horaire_prio : Horaire personnes prioritaires
  • check_rdv : Avec ou sans rendez-vous
  • tel_rdv : Téléphone prise rendez-vous
  • web_rdv : Site internet prise rendez-vous
  • date_modif : Dernière date de mise à jour

We can found all this fields in Kibana, as you’ll discover in the following steps.

Next steps on the raw csv file

In the rest of the file, each row corresponds to a test site. In a nutshell, that means that the 3,272 rows of the file describes the 3,272 test sites. Let’s focus on the first one after the header:

Csv détail en-têtes - Octave.io Blog
csv: detail on header

On one row, all the fields are in the same order as the header row, beginning with the identifier to go until the last update date. To get further, we replace this row as if we open the file in a spreadsheet. Which would look like this:

IDid_ej,finessRSadressecpl_loclongitudelatitudemod_prelpublichorairehoraire_priocheck_rdvtel_rdvweb_rdvdate_modif
HlI2rCJ014Dk4X3Z010001725010001733BM CROIX BLANCHE BOURG EN B1 AV AMEDEE MERCIER 01000 BOURG EN BRESSE5.2418520518206646.2038511077026Sur placeTout publiclundi : 8h00-12h00 et 14h00-19h00 | mardi : 8h00-12h
[…]
samedi : 8h00-12h00 | dimanche : fermé
/Sur rendez-vous uniquement0474452636/2020-09-24
First row – header detail csv file

As you can see, some cells are empty. This means that in our file, nothing is filled for the field. How can we see it? In the text file, 2 commas follow each other. Thus, it indicates that the field is an empty, no value between these two commas.

Actually,Kibana indicates this empty field in an other way. We will see that in the next part.

How looks your CSV file in Kibana?

When we process our raw file into Kibana (tutorials available here to get started ), we specify the “delimiters” used in CSV. In our case, the delimiters or separations are commas. By specifying it, Drop to Kibana can parse the fields and extract correctly the information; Subsequently, the fields as text or character string (string), dates (date), numbers (number), or geolocated coordinates (geo_point) are well categorized.

When our file is in Kibana, we also find the header we talked about a few paragraphs above. You can see it in Discover, and it looks like this.

 


In Kibana, the number of records is the number of rows in the raw file. For example, in our file on COVID-19 sampling sites, we count:

 

When we go a bit further with our csv file, we can display every row with field s detail within the Discover Kibana menu.
Each “_sources” corresponds to a row which lists a test site. The difference with the raw file is that Kibana replaces the names of the categories before each value:

 

It may happen that Kibana displays the value “NULL” for some fields and this is completely normal! Indeed, earlier in this article, I told you about empty fields in the raw file, with two commas following. Therefore, when Kibana processes our file, it puts this famous value “NULL” for the empty field. As a result, this specifies that nothing is indicated in the cell.

Now, you are an expert in CSV files, ready to use it in Kibana and get insights from data.

If you want to learn more about Kibana, I invite you to have a look on our previous articles. For the freshest posts and tips, you can follow us on your favorite social network LinkedIn, Facebook or Twitter.

Test your knowledge on Drop to Kibana now with our Freemium offer, only on Octave.io.