PySparkAudit: PySpark Data Audit
  • 1. Preface
  • 2. How to Install
  • 3. PySpark Data Audit Functions
  • 4. Auditing Demos
  • 5. Main Reference
PySparkAudit: PySpark Data Audit
  • Docs »
  • Contents

_images/logo.png

Welcome to our PySparkAudit: PySpark Data Audit Library API! The PDF version can be downloaded from HERE.

You can install the PySparkAudit from [PyPI](https://pypi.org/project/PySparkAudit):

pip install PySparkAudit

ContentsΒΆ

  • 1. Preface
    • 1.1. About
      • 1.1.1. About this API
      • 1.1.2. About the author
    • 1.2. Acknowledgement
    • 1.3. Feedback and suggestions
  • 2. How to Install
    • 2.1. Install with pip
    • 2.2. Install from Repo
      • 2.2.1. Clone the Repository
      • 2.2.2. Install
    • 2.3. Uninstall
    • 2.4. Test
      • 2.4.1. Run test code
      • 2.4.2. Audited Results
  • 3. PySpark Data Audit Functions
    • 3.1. Basic Functions
      • 3.1.1. mkdir
      • 3.1.2. mkdir_clean
      • 3.1.3. df_merge
      • 3.1.4. data_types
      • 3.1.5. dtypes_class
      • 3.1.6. counts
      • 3.1.7. describe
      • 3.1.8. percentiles
      • 3.1.9. feature_len
      • 3.1.10. freq_items
      • 3.1.11. rates
      • 3.1.12. corr_matrix
    • 3.2. Plot Functions
      • 3.2.1. hist_plot
      • 3.2.2. bar_plot
      • 3.2.3. trend_plot
    • 3.3. Summary Functions
      • 3.3.1. dataset_summary
      • 3.3.2. numeric_summary
      • 3.3.3. category_summary
    • 3.4. Auditing Function
      • 3.4.1. auditing
    • 3.5. Plotting Function
      • 3.5.1. fig_plots
  • 4. Auditing Demos
    • 4.1. Auditing function by function
    • 4.2. Auditing in one function
      • 4.2.1. print in bash
      • 4.2.2. Audited results folder
    • 4.3. Auditing on Big Dataset
      • 4.3.1. print in bash
      • 4.3.2. Audited results folder
  • 5. Main Reference
Next

© Copyright 2019, Wenqiang Feng and Yiming Xu Last updated on Jul 03, 2019.