Plotly is a well-known open source library that add dynamic and interactive capability to already beautiful graphs. With Plotly, we can make line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms, heatmaps, subplots, multiple-axes, polar charts, and bubble charts, etc. Especially, Plotly natively supports many kinds of Maps charts which sometimes it is extremely useful to visualize geographical-based data.
In this article, we will explore how to use some of Plotly Maps charts to visualize geographical data of the Covid-19 outbreak that is currently causing one of the most serious pandemics in the world history.
First of all, let's import all basic and necessary libraries into Jupyter notebook. We need to import only one module Plotly's express to use geographical charts
Now, we need to download the daily latest Covid-19 dataset from Kaggle to use for visualization. To seamlessly download latest data directly from Kaggle into our Jupyter notebook, we use the Kaggle API CLI, so that we don't have to manually log on to Kaggle website to download
If you haven't used Kaggle API CLI before, it's very simple, please read it here https://www.kaggle.com/docs/api
After downloaded the Covid-19 dataset from Kaggle, it will extract the dataset zip file to local filesystem. And, we can load the csv data file into Pandas data frame
But before we do that, let's explore some basic info about the dataset as we always do in any data science process first
Let's use DataFrame's info function to see the basic info of the data frame
As we see from the data frame, the data has Country Name and Latitude/Longitude geographical data which we can use to plot the on maps
Now, we can explore some basic statistics from the data frame
We see that the data is accumulative which is the latest data row of each Country/State already has all the accumulative data.
Now, we need to get the latest row of data of each country/state because we don't need to plot the historical data here. Then, we need group by each country and get sum of all cases of each country, because we don't need data at state level.
Now, it comes to the interesting part. We will use Plotly's choropleth to plot data con the world map to map each data point of each country based on country names. We can also use the Latitude/Longitude data, but we don't need that accuracy in this case, country name is enough
We can hover over the mouse on the map and it will interactively show country name and data value of of point. The nice thing about this chart is we can use color-coded based on data. The more cases a country has the darker its color on the map. So, if we see some spots are more red than the others, it means there are more cased; and the situation is worse there.
Now, we can also plot scatter plot on top of the geographical background of the world map using Plotly's scatter geo chart.
Same as the Choropleth chart, we can hover the mouse over data points to interactively see data of that point.
Finally, we can plot some more data about Covid-19 in our data frame to explore
Conclusion
The Plotly library is a very simple but powerful interactive charting library. If we use it effectively it will significantly add more value into our data science project and presentation.
It is another great tool to learn and keep handy in our data science visualization toolbox.
Happy visualizing!
Comments