name: inverse layout: true class: center, middle, inverse --- # Python for Data visualization ## Lecture 4 --- layout: false ## Plotting and Visualization * Visualization is a key component of data analysis, allows you to explore data and communicate results * Matplotlib is the most popular Python library for creating static, animated, and interactive visualizations in Python * Pandas is built on top of Matplotlib and provides a high-level interface for plotting * Seaborn is a Python data visualization library based on Matplotlib and provide same functionality as Matplotlib but with more attractive and informative statistical graphics --- ## Matplotlib primer Matplotlib is a plotting library for the Python and its numerical mathematics extension, NumPy ```python import matplotlib.pyplot as plt ``` The `pyplot` module provides a MATLAB-like interface for making plots a simple plot example: ```python mydata = range(1, 10) # create a list of numbers from 1 to 9 plt.plot(mydata) # plot the data ``` .cols[ .sixty[
] .fourty[ .small[though pandas's built-in plotting functions will deal with many of the mundane details of making plots, Matplotlib goes beyond, for advanced plotting with its powerful and flexible functions.] ] ] --- ## Basic plotting with Matplotlib ### Line plot .cols[ .sixty[ ```python import matplotlib.pyplot as plt ``` ```python x = [1, 2, 3, 4, 5] y = [2, 4, 6, 8, 10] plt.plot(x, y) plt.xlabel('X-axis') plt.ylabel('Y-axis') plt.title('Simple Line Plot') plt.show() ``` ] .white[text] .fourty[
] ] --- ## Basic plotting with Matplotlib ### Scatter plot .cols[ .sixty[ ```python import matplotlib.pyplot as plt ``` ```python x = [1, 2, 3, 4, 5] y = [2, 4, 6, 8, 10] plt.scatter(x, y, color='red', marker='o') plt.xlabel('X-axis') plt.ylabel('Y-axis') plt.title('Scatter Plot') plt.show() ``` ] .white[text] .fourty[
] ] --- ## Basic plotting with Matplotlib ### Bar plot .cols[ .sixty[ ```python import matplotlib.pyplot as plt ``` ```python x = ['A', 'B', 'C', 'D', 'E'] y = [10, 20, 15, 25, 30] plt.bar(x, y, color='green') plt.xlabel('Categories') plt.ylabel('Values') plt.title('Bar Plot') plt.show() ``` ] .white[text] .fourty[
] ] --- ## Basic plotting with Matplotlib ### Histogram .cols[ .sixty[ ```python import matplotlib.pyplot as plt import random ``` ```python data = random.sample(range(1, 5000), 500) plt.hist(data, bins=30) plt.xlabel('Values') plt.ylabel('Frequency') plt.title('Histogram') plt.show() ``` ] .white[text] .fourty[
] ] --- ## Basic plotting with Matplotlib ### Pie chart .cols[ .sixty[ ```python import matplotlib.pyplot as plt ``` ```python sizes = [15, 30, 45, 10] labels = ['A', 'B', 'C', 'D'] plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=140) plt.axis('equal') plt.title('Pie Chart') plt.show() ``` ] .white[text] .fourty[
] ] --- ## And so many more! See [plot types](https://matplotlib.org/stable/plot_types/index.html) in Matplotlib documentation .cols[ .fifty[ | Plot Command | Plot Name | |-----------------------------|---------------------------| | `plt.plot()` | Line Plot | | `plt.scatter()` | Scatter Plot | | `plt.bar()` | Bar Plot | | `plt.barh()` | Horizontal Bar Plot | | `plt.hist()` | Histogram | | `plt.boxplot()` | Box Plot | | `plt.violinplot()` | Violin Plot | | `plt.pie()` | Pie Chart | | `plt.polar()` | Polar Plot | | `plt.contour()` | Contour Plot | | `plt.contourf()` | Filled Contour Plot | | `plt.imshow()` | Image Plot | | `plt.hexbin()` | Hexbin Plot | | `plt.stem()` | Stem Plot | ] .fifty[ | Plot Command | Plot Name | |-----------------------------|---------------------------| | `plt.quiver()` | Quiver Plot | | `plt.streamplot()` | Streamplot | | `plt.errorbar()` | Errorbar Plot | | `plt.fill_between()` | Fill Between Plot | | `plt.step()` | Step Plot | | `plt.scatter3D()` | 3D Scatter Plot | | `plt.plot3D()` | 3D Line Plot | | `plt.bar3d()` | 3D Bar Plot | | `plt.contour3D()` | 3D Contour Plot | | `plt.triplot()` | Triangular Plot | | `plt.tricontour()` | Triangular Contour Plot | | `plt.tricontourf()` | Filled Triangular Contour | | `plt.spy()` | Spy Plot | | `plt.broken_barh()` | Broken Bar Plot | | `plt.matshow()` | Matrix Plot | ] ] PS: run `help(plt.command)` to see more details about each plot command --- ## Figures and subplots plots in Matplotlib reside within a `Figure` object. You can create a new figure with `plt.figure()`, and modify its properties .cols[ .sixty[ ```python fig = plt.figure(figsize=(6, 4), facecolor='lightskyblue', layout='constrained' ) fig.suptitle('A nice Matplotlib Figure') ax = fig.add_subplot() ax.set_title('Axes', loc='left', fontstyle='oblique', fontsize='medium' ) ``` ] .white[text] .fourty[
] ] PS: run `help(plt.figure)` to see all the available options --- ## Figures and subplots To add subplots to a figure, you can use `add_subplot()` so, for an empty matplotlib figure with three subplots: .cols[ .sixty[ ```python fig = plt.figure() ax1 = fig.add_subplot(2, 2, 1) ax2 = fig.add_subplot(2, 2, 2) ax3 = fig.add_subplot(2, 2, 3) ``` ] .fourty[
] ] like `subplot` there is `subplots` to create multiple subplots at once: ```python fig, axs = plt.subplots(2, 2, layout='constrained') ``` --- ## Testing subplots Let's first create 4 simple datasets: ```python mydata1 = random.sample(range(1, 50), 10) # 10 random numbers between 1-50 mydata2 = random.sample(range(1, 5000), 500) # 500 random numbers between 1-5000 mydata3 = range(30) # numbers 0 to 29, ordered # same as mydata3 but multiplied with random numbers 0 to 1 range mydata4 = [] for num in mydata3: mydata4.append(num * random.random()) ``` And then plot them in the subplots: .cols[ .sixty[ ```python fig = plt.figure() ax1 = fig.add_subplot(2, 2, 1) ax2 = fig.add_subplot(2, 2, 2) ax3 = fig.add_subplot(2, 2, 3) ax1.plot(mydata1, linestyle="dashed") ax2.hist(mydata2, bins = 50, alpha=0.5) ax3.scatter(mydata3, mydata4) ``` ] .fourty[
] ] --- ## Matplotlib Pros and Cons .cols[ .fifty[ ### Cons * Steep Learning Curve * Verbose Syntax * Limited High-Level Plotting * Default limited aesthetics * Limited statistical plotting ] .fifty[ ### Pros * Highly customizable * Extensive documentation * Active community * Wide range of plot types * Integration with Pandas ] ] --- ## Seaborn ### What is Seaborn? * Seaborn is a Python data visualization library based on Matplotlib. * It provides a high-level interface for drawing attractive statistical graphics. * Seaborn is built on top of Matplotlib and integrates closely with Pandas data structures. ### Key Features of Seaborn * Statistical Visualization: Seaborn simplifies the process of creating statistical plots such as histograms, box plots, and scatter plots. * Integration with Pandas: Seaborn seamlessly works with Pandas DataFrames, making it easy to visualize data directly from data structures. * Beautiful Aesthetics: Seaborn comes with beautiful default styles and color palettes, enhancing the visual appeal of plots. --- ## Documentation Great documentation is available for both Matplotlib and Seaborn. Seaborn particularly has a very detailed and well-organized documentation. * [Matplotlib Documentation](https://matplotlib.org/stable/contents.html) * [Seaborn Documentation](https://seaborn.pydata.org/)
--- ## Seaborn getting Started To use Seaborn, you need to install it first: ```python conda install seaborn ``` or ```python pip install seaborn ``` Then, you can import it in your Python script: ```python import seaborn as sns ``` --- ## Test datasets Seaborn comes with built-in datasets that you can use to test the library. To check the available datasets, you can use: ```python sns.get_dataset_names() ``` To load a dataset, you can use: ```python df = sns.load_dataset('iris') ```
--- ## Seaborn plotting Example scatter plot with Seaborn: ```python df = sns.load_dataset('iris') sns.scatterplot(x='sepal_length', y='sepal_width', data=df) ```
--- ## Seaborn plotting Example box plot with Seaborn: ```python df = sns.load_dataset('iris') sns.boxplot(x='species', y='sepal_length', data=df) ```
--- ## Seaborn plotting Example pair plot with Seaborn: .cols[ .sixty[ ```python df = sns.load_dataset('iris') sns.pairplot(df, hue='species') ``` ] .fourty[
] ] --- ## Figure level and axes level plots Seaborn has two types of functions for plotting:
--- --- ## Seaborn Various Plots .cols[ .fifty[ ### Relational Plots | | | |--------------------------------|-----------------------| | Scatter Plot | `sns.scatterplot()` | | Line Plot | `sns.lineplot()` | | Joint Plot | `sns.jointplot()` | | Pair Plot | `sns.pairplot()` | | Joint Kernel Density Estimate | `sns.jointplot(kind='kde')` | ### Categorical Plots | | | |--------------------------------|-----------------------| | Bar Plot | `sns.barplot()` | | Count Plot | `sns.countplot()` | | Point Plot | `sns.pointplot()` | | Box Plot | `sns.boxplot()` | | Violin Plot | `sns.violinplot()` | | Swarm Plot | `sns.swarmplot()` | | Categorical Scatter Plot | `sns.catplot(kind='strip')` | ] .fifty[ ### Distribution Plots | | | |--------------------------------|-----------------------| | Histogram | `sns.histplot(kind='hist')` | | Kernel Density Estimate (KDE) | `sns.kdeplot()` | | Rug Plot | `sns.rugplot()` | ### Matrix Plots | | | |--------------------------------|-----------------------| | Heatmap | `sns.heatmap()` | | Clustermap | `sns.clustermap()` | ### Regression Plots | | | |--------------------------------|-----------------------| | Linear Regression Plot | `sns.regplot()` | | Residual Plot | `sns.residplot()` | | Lowess Smoothing Plot | `sns.lmplot()` | ] ] --- ## Seaborn Various Plots ### Time Series Plots | | | |--------------------------------|-----------------------| | Time Series Plot | `sns.lineplot()` | | Time Series Heatmap | `sns.heatmap()` | ### Multi-Plot Grids | | | |--------------------------------|-----------------------| | FacetGrid | `sns.FacetGrid()` | | PairGrid | `sns.PairGrid()` | | JointGrid | `sns.JointGrid()` | ### Color Palettes | | | |--------------------------------|-----------------------| | Color Palette | `sns.color_palette()` | | Color Brewer Palettes | `sns.color_palette('colorbrewer')` | | Cubehelix Palettes | `sns.color_palette('cubehelix')` | | Xkcd Palettes | `sns.color_palette('xkcd')` | --- ## Seaborn Various Plots ### Plot Aesthetics | | | |--------------------------------|-----------------------| | Set Theme | `sns.set_theme()` | | Set Context | `sns.set_context()` | | Set Style | `sns.set_style()` | | Set Color Codes | `sns.set_color_codes()` | | Set Palette | `sns.set_palette()` | | Set Axis Labels | `sns.set_axis_labels()` | | Set Title | `sns.set_title()` | | Set Legend | `sns.set_legend()` | | Set Ticks | `sns.set_ticks()` | | Set Grid | `sns.set_grid()` | | Set Spines | `sns.set_spines()` | | Set Background | `sns.set_background()` | PS: run `help(sns.command)` to see more details about each plot command --- name: last-page template: inverse ## That's all folks (for now)!