2. How to Install¶
2.1. Install with pip
¶
You can install the PySparkAudit
from [PyPI](https://pypi.org/project/PySparkAudit):
pip install PySparkAudit
2.2. Install from Repo¶
2.2.1. Clone the Repository¶
git clone https://github.com/runawayhorse001/PySparkAudit.git
2.2.2. Install¶
cd PySparkAudit
pip install -r requirements.txt
python setup.py install
2.3. Uninstall¶
pip uninstall statspy
2.4. Test¶
2.4.1. Run test code¶
cd PySparkAudit/test
python test.py
test.py
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("Python Spark regression example") \
.config("spark.some.config.option", "some-value") \
.getOrCreate()
# from PySparkAudit import dtypes_class, hist_plot, bar_plot, freq_items,feature_len
# from PySparkAudit import dataset_summary, rates, trend_plot
# path = '/home/feng/Desktop'
# import PySpark Audit function
from PySparkAudit import auditing
# load dataset
data = spark.read.csv(path='Heart.csv',
sep=',', encoding='UTF-8', comment=None, header=True, inferSchema=True)
# auditing in one function
print(auditing(data, display=True))