Apache Spark course

Apache Spark is a unified analytics engine for large data processing. It is a popular open-source framework that provides a wide range of functionalities for data processing, including: Data ingestion and transformation: Spark can read data from a variety of sources, including HDFS, Hive, Kafka, and Cassandra. It can also transform data using a variety of operations, such as filtering, sorting, and joining. Real-time data processing: Spark Streaming enables real-time processing of data streams. This is useful for applications such as fraud detection and anomaly analysis. Machine learning: Spark MLlib provides a collection of machine learning algorithms, including classification, regression, and clustering. Graph processing: Spark GraphX provides a programming model for graph processing. This is useful for analyzing social networks and other complex relationships. Spark is a general-purpose analytics engine that can be used for a wide variety of applications. It is particularly well-suited for applications that require high performance and scalability.
  • After completing an Apache Spark course, you will be able to:
  • Understand the fundamentals of big data concepts and technologies
  • Gain proficiency in building and running Spark applications
  • Master the use of Spark core functionalities for data processing and analysis
  • Learn to work with Spark SQL for structured data manipulation
  • Apply Spark Streaming for real-time data analysis and processing
  • Harness Spark MLlib for machine learning tasks
  • Deploy Spark applications on various environments, including clusters and standalone
  • Data engineers: Data engineers are responsible for building and maintaining data pipelines. Spark can be used to implement data pipelines for a variety of use cases, such as data warehousing and data lakes.

  • Data scientists: Data scientists use data to extract insights and make predictions. Spark can be used to build machine learning models and perform data analysis.

  • Software engineers: Software engineers can use Spark to develop applications that require high performance and scalability.

1. Spark: Origins & Ecosystem for Big Data Scientists, the Scala, Python & R flavor Big data solutions such as Spark are hard to setup, time consuming to learn, and obscure for non-technical users. The aim of the video is to give you just enough information to find your way through this complex ecosystem.

Learning Objective:This module will introduce you to its building blocks and the various fundamental concepts of Power BI.

  • Data Visualization
  • Business Intelligence tools
  • Introduction to Tableau
  • Tableau Architecture
  • Tableau Server Architecture
  • VizQL
  • Introduction to Tableau Prep
  • Tableau Prep Builder User Interface
  • Data Preparation techniques using Tableau Prep Builder tool

Topics »

2. Install Spark on Your Laptop

Learning Objective:The goal of this training course module is to show the steps to install Spark on your laptop.

  • Docker/Scale
  • Connect to data from File and Database
  • Types of Connections
  • Joins and Unions
  • Data Blending
  • Tableau Desktop User Interface
  • Basic project: Create a workbook and publish it on Tableau Online

Topics »

3. Apache Zeppelin

Learning Objective:In this course module, you will understand Apache Zeppelin, a Web-Based Notebook for Spark with matplotlib and ggplot2

  • Apache Zeppelin
  • matplotlib
  • ggplot2
  • Data Granularity

Topics »

4. Calculations in Tableau

Learning Objective:In this Tableau online course module, you will understand basic calculations such as Numeric, String Manipulation, Date Function, Logical and Aggregate. You will also get introduced to Table Calculations and Level Of Detail (LOD) expressions.

  • Types of Calculations
  • Built-in Functions (Number, String, Date, Logical and Aggregate)
  • Operators and Syntax Conventions
  • Table Calculations
  • Level Of Detail (LOD) Calculations
  • Using R within Tableau for Calculations

Topics »

5. Advanced Visual Analytics

Learning Objective: In this Tableau online training module, you will deep dive into Visual Analytics in a more granular manner. It covers various advanced techniques for analysing data that includes Forecasting, Trend Lines, Reference Lines, Clustering, and Parameterized concepts.

  • Parameters
  • Tool tips
  • Trend lines
  • Reference lines
  • Forecasting
  • Clustering

Topics »

6. Level Of Detail (LOD) Expressions in Tableau

Learning Objective:In this course module, you will deep dive into advanced analytical scenarios, using Level Of Detail expressions.

  • Use Case I - Count Customer by Order
  • Use Case II - Profit per Business Day
  • Use Case III - Comparative Sales
  • Use Case IV - Profit Vs Target
  • Use Case V - Finding the second order date
  • Use Case VI - Cohort Analysis

Topics »

7. Geographic Visualizations in Tableau

Learning Objective: In this training module, you will gain an understanding of Geographic Visualizations in Tableau.

  • Introduction to Geographic Visualizations
  • Manually assigning Geographical Locations
  • Types of Maps
  • Spatial Files
  • Custom Geocoding
  • Polygon Maps
  • Web Map Services
  • Background Images

Topics »

8. Advanced Charts in Tableau

Learning Objective: In this Tableau course online module, you will learn to plot various advanced charts in Tableau Desktop.

  • Box and Whisker’s Plot
  • Bullet Chart
  • Bar in Bar Chart
  • Gantt Chart
  • Waterfall Chart
  • Pareto Chart
  • Control Chart
  • Funnel Chart
  • Bump Chart
  • Step and Jump Lines
  • Word Cloud
  • Donut Chart

Topics »

9. Dashboards and Stories

Learning Objective: In this course, you will learn to build Dashboards and Stories within Tableau.

  • Introduction to Dashboards
  • The Dashboard Interface
  • Dashboard Objects
  • Building a Dashboard
  • Dashboard Layouts and Formatting
  • Interactive Dashboards with actions
  • Designing Dashboards for devices
  • Story Points

Topics »