2. How to Install

2.1. Install with pip

You can install the PyAudit from [PyPI](https://pypi.org/project/PyAudit):

pip install PyAudit

2.2. Install from Repo

2.2.1. Clone the Repository

git clone https://github.com/runawayhorse001/PyAudit.git

2.2.2. Install

cd PyAudit
pip install -r requirements.txt
python setup.py install

2.2.3. Uninstall

pip uninstall statspy

2.2.4. Test

cd PyAudit/test
python test.py

test.py

from PyAudit.basics import missing_rate, zero_rate, dtypes_class
from PyAudit.basics import feature_variance, freq_items_df, feature_len
from PyAudit.basics import corr_matrix,numeric_summary, category_summary
import pandas as pd
import os, sys

output = os.path.abspath(os.path.join(sys.path[0])) + '/output'
print(output)
d = {'A': [1, 0, None, 3],
     'B': [1, 0, 0, 0],
     'C': ['a', None, 'c', 'd']}

# create DataFrame
df = pd.DataFrame(d)
print(missing_rate(df))
print(zero_rate(df))
print(feature_variance(df))
print(df)
print(feature_len(df))
print(numeric_summary(df, output))
print(category_summary(df, output))
print(corr_matrix(df, output))


d ={
    'num': list('1223334444'),
    'cat': list('wxxyyyzzzz')
}
df = pd.DataFrame(d)
df = df.astype({"num": int, "cat": object})
print(freq_items_df(df, top_n=4))

# read df
df = pd.read_csv('Heart.csv', dtype={'Sex': bool})
print(df.head(5))
(num_fields, cat_fields, bool_fields, data_types, type_class) = dtypes_class(df)

print(num_fields)
print(cat_fields)
print(bool_fields)
print(data_types)
print(type_class)
print(missing_rate(df))
print(zero_rate(df))

print(freq_items_df(df, top_n=4))
print(feature_len(df))
print(numeric_summary(df, output))
print(category_summary(df, output))
print(corr_matrix(df, output))

Results:

  feature  missing_rate
0       A          0.25
1       B          0.00
2       C          0.25
  feature  zero_rate
0       A   0.333333
1       B   0.750000
2       C   0.000000
  feature  feature_variance
0       A               1.0
1       B               0.5
2       C               1.0
   Age    Sex     ChestPain  RestBP  Chol  ...  Oldpeak  Slope   Ca        Thal  AHD
0   63   True       typical     145   233  ...      2.3      3  0.0       fixed   No
1   67   True  asymptomatic     160   286  ...      1.5      2  3.0      normal  Yes
2   67   True  asymptomatic     120   229  ...      2.6      2  2.0  reversable  Yes
3   37   True    nonanginal     130   250  ...      3.5      3  0.0      normal   No
4   41  False    nontypical     130   204  ...      1.4      1  0.0      normal   No

[5 rows x 14 columns]
['Age', 'RestBP', 'Chol', 'Fbs', 'RestECG', 'MaxHR', 'ExAng', 'Oldpeak', 'Slope', 'Ca']
['ChestPain', 'Thal', 'AHD']
['Sex']
      feature   dtypes
0         Age    int64
1         Sex     bool
2   ChestPain   object
3      RestBP    int64
4        Chol    int64
5         Fbs    int64
6     RestECG    int64
7       MaxHR    int64
8       ExAng    int64
9     Oldpeak  float64
10      Slope    int64
11         Ca  float64
12       Thal   object
13        AHD   object
      feature   dtypes     class
0         Age    int64   numeric
1         Sex     bool      bool
2   ChestPain   object  category
3      RestBP    int64   numeric
4        Chol    int64   numeric
5         Fbs    int64   numeric
6     RestECG    int64   numeric
7       MaxHR    int64   numeric
8       ExAng    int64   numeric
9     Oldpeak  float64   numeric
10      Slope    int64   numeric
11         Ca  float64   numeric
12       Thal   object  category
13        AHD   object  category
      feature  missing_rate
0         Age      0.000000
1         Sex      0.000000
2   ChestPain      0.000000
3      RestBP      0.000000
4        Chol      0.000000
5         Fbs      0.000000
6     RestECG      0.000000
7       MaxHR      0.000000
8       ExAng      0.000000
9     Oldpeak      0.000000
10      Slope      0.000000
11         Ca      0.013201
12       Thal      0.006601
13        AHD      0.000000

Process finished with exit code 0