Report Creation on Superset with PostgreSQL
Superset is a data exploration platform designed to be visual, intuitive and interactive. Superset can perform analytics at the Speed of the thought and has rich set of data visualizations. It is an extensible, high granularity security model allows intricate rules on who can access what feature and allows to control how data sources are exposed to user by defining metrics and rules.
This is a new visualization open source tool which is under incubation. My Article deals with the integration of PostgreSQL with Apache Superset to design and visualize data into reports with different use cases
Assuming that the reader of this article has a basic knowledge on data analytics (report visualization tools) and know about docker to initialize the development environment.
Apache Superset is a modern, enterprise ready BI web application which is undergoing incubation at The Apache Software Foundation.
· Group of Data Visualizations
· Easy interface for exploring and visualizing Data
· Create and share Dashboards
· Integration with most SQL-speaking RDBMS through SQLAlchemy
· Deep integration with Druid.io
How to connect?
Superset is cloud-native and has been designed to work well in containers and large distributed environments.
Start with Docker: Docker is an open source platform to create, deploy and manage virtual application containers on a common operating system. Below is the shortcut to initialize development environment in the docker.
After the initialization, then open the browser and enter the URL http://yourlocalhostorIP:8088 which will lead to Login page.
Enter your credentials and Sign In. When you are signed in, you will see the below screen
Click on the “Sources” option present at the top left corner as you can see in above screen shot, which will lead to step for Adding Database.
Adding a Data Source (Database):
Select Databases option to add a new database to Superset.
From the screen, click on the “+” button to add database.
Enter the name of the database as shown in the below screenshot.
Provide the SQLAlchemy connection URL and click on test connection button.
This example shows the connection string of the PostgreSQL database that we have connected. Once you’re successfully connected, you will see a pop-up as below.
Additionally, you will also see the list of tables Superset could read from Schema, that you are connected to at the bottom of the screen.
If all the configuration seems ok, click on the “Save” button at the bottom.
Adding a Dataset (Table):
We can add the specific tables/views from the database to the Superset, for querying or creating the reports.
Click on the “Sources” drop-down as earlier.
You will get the below screen.
We will see the list of tables already added. To add a new table, click on the “+” as marked in the above screenshot.
Enter the details in the fields provided. Select the database from the dropdown, enter the schema name and table name in the appropriate fields as shown in the above screenshot. Click on “Save” after entering the details.
We would notice that the new table is added to the list of tables. We need to configure the table as per our requirements to build the report.
Click on the “Edit Record” option to configure the table. You will get the below screen.
You will see three tabs as marked in the above screenshot. “Detail” tab gives the complete details of the table. You can change the options available as per the requirement and click on the “Save” button at the end to save the changes you have done.Click on the “List Columns” tab to define the specific columns to explore data.
Few Details on the check boxes available in the above screen shot:
Groupable — check this when you want to group the data by specific column. (Group by Clause)
Filterable — check this when you want to filter the data for a specific column. (Where Clause)
Count Distinct — check this when you need distinct count of a specific column (count(distinct))
Sum/Min/Max — check these when you need to sum up a specific column or to find minimum value or maximum value of a specific column respectively.
Is Temporal — This check box will be checked by default for any date or time fields.
And click on the “Save” button after you define the columns as per your requirement.
Additionally, you can create other metrics in the “List Metrics” tab.
Click on the “+” button to add a new metric as shown in the above screenshot.
Enter the details as shown in the above screenshot.
Metric: Name of the metric
Description: Description if required
Verbose Name: Display name in the report
Type: aggregation type or function type
SQL Expression: A valid SQL expression.
Table: select the table on which the metric to be created
Is Restricted: Check this when you wanted to restrict the access of the metric that you have created to users/ roles.
Click on the “Save” button after entering the details.
The new metric is created as shown in the above screenshot. Likewise, you could create the other metrics based on your requirement.
Creating a Report:
Go back to the List of Tables from the “Sources” menu.
Click on the table as shown above. You will get the below screen. Default Visualization type is “Table View” as highlighted.
By Default, last 7 days data will be filtered as shown in the below screenshot.
When you want to filter the data as per your requirements you could select it in the “since” and “Until” options available as shown in the above screenshot.
We could select the values from the calendar as shown in the above screenshot.
Superset supports smart date like “last Sunday” or “last March” in the “Free form” option as shown in the below screenshot.
To get all the data you can use “Clear” option. To get the current data use “now” option.
You can also select date from the “Relative” option.
The metrics section will be there under the “GROUP BY” header as shown below.
We will notice all the metrics available on the table from the drop-down “Metrics”.
Below is the sample report that returns the count of records in a table.
In this example, select “Count(*)” value for Metrics field and then click on the “Run Query” button at the top as highlighted in the below screenshot.
We see that the query returns the total count of records for the table (highlighted in the screen shot above).
Now, as we have the total count of records let us try to group the records based on Item category, to do this select “Item category” value for “Group by” field and run the query.
We see that the query returns the count of records for each set of Item Category.
Now let us try an example to display the Profits gained by each country using Bar Chart.
To achieve this, select Profit as value for Metrics field and Employee Country as value for Group By field.
To change the visualization type, go to the Visualization Type option
Now change the Visualization type to Distribution — Bar Chart from default value Table View.
Click on “Run Query” to view the result, you will notice the visualization type has changed to bar graph.
You can format the report as per your requirements as highlighted in the above screenshot.
When you add any column on the “Breakdowns” drop-down, your result will be categorized based on the column you’ve selected in the breakdowns. This could be better understood by the following example.
In the example, “Item Category” column is added in the breakdowns option, so the result is categorized based on “Item Category”. More precisely, it shows profit on each Item Category in different countries.
Click on the “Save” button to save the report. You will get the below screenshot.
Select the “Overwrite slice….” Option to save. If you want to Save the report with another name, check the “Save As” option. When you want to add this Slice to dashboard, select the appropriate options as shown in the above screenshot.
Click on “Save” button.
In the “Charts” section, you could notice the saved report as shown below.
Select the “Dashboards” option and click on “+” symbol to add a new dashboard as shown below.
The following screen will appear. Enter the details and the relevant charts as shown below.
After entering the details click on “Save” button.
The saved dashboard will appear under the “Dashboards” option as shown below.
Click on “Sales Dashboard” to view the dashboard. Your saved dashboard will appear as shown below. You can edit the dashboard and move the slices and adjust the size by the clicking on “Edit Dashboard”. After making the changes you need to save the dashboard from the “Actions” dropdown as highlighted below
You have “SQL Lab” option in the menu which has a SQL editor to query the data.
You have 3 options from the SQL Lab drop-down. SQL Editor is the editor where you query the data and view results. You could create views and tables in this editor to the selected database.
The “Query Search” option in the “SQL Lab” is used to search all the queries you have run as shown below.
The “Saved Queries” option shows all the list of your saved queries.