2. How to Install

2.1. Install with pip

You can install the PySparkAudit from [PyPI](https://pypi.org/project/PySparkAudit):

pip install PySparkAudit

2.2. Install from Repo

2.2.1. Clone the Repository

git clone https://github.com/runawayhorse001/PySparkAudit.git

2.2.2. Install

cd PySparkAudit
pip install -r requirements.txt
python setup.py install

2.3. Uninstall

pip uninstall statspy

2.4. Test

2.4.1. Run test code

cd PySparkAudit/test
python test.py

test.py

from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("Python Spark regression example") \
    .config("spark.some.config.option", "some-value") \
    .getOrCreate()


# from PySparkAudit import dtypes_class, hist_plot, bar_plot, freq_items,feature_len
# from PySparkAudit import dataset_summary, rates, trend_plot

# path = '/home/feng/Desktop'

# import PySpark Audit function
from PySparkAudit import auditing

# load dataset
data = spark.read.csv(path='Heart.csv',
                      sep=',', encoding='UTF-8', comment=None, header=True, inferSchema=True)

# auditing in one function 
print(auditing(data, display=True))

2.4.2. Audited Results

_images/t_folder.png