The Spark Python API (PySpark) exposes the Spark programming model to Python.

The open source community has developed a utility for spark python big data processing known as PySpark. PySpark helps data scientists interface with Resilient Distributed Datasets in Apache spark and Python. Py4J is a popularly library integrated within PySpark that lets python interface dynamically with JVM objects (RDD’s).

