21. Wrap PySpark Package¶
It’s super easy to wrap your own package in Python. I packed some functions which I frequently used in my daily work. You can download and install it from My PySpark Package. The hierarchical structure and the directory structure of this package are as follows.
21.1. Package Wrapper¶
21.1.1. Hierarchical Structure¶
|-- build
| |-- bdist.linux-x86_64
| |-- lib.linux-x86_64-2.7
| |-- PySparkTools
| |-- __init__.py
| |-- Manipulation
| | |-- DataManipulation.py
| | |-- __init__.py
| |── Visualization
| |-- __init__.py
│ |-- PyPlots.py
|-- dist
│ |-- PySParkTools-1.0-py2.7.egg
|-- __init__.py
|-- PySparkTools
| |-- __init__.py
| |-- Manipulation
| | |-- DataManipulation.py
| | |-- __init__.py
| |-- Visualization
| |-- __init__.py
| |-- PyPlots.py
│ |-- PyPlots.pyc
|-- PySParkTools.egg-info
| |-- dependency_links.txt
| |-- PKG-INFO
| |-- requires.txt
| |-- SOURCES.txt
| |-- top_level.txt
|-- README.md
|-- requirements.txt
|-- setup.py
|-- test
|-- spark-warehouse
|-- test1.py
|-- test2.py
From the above hierarchical structure, you will find that you have to have __init__.py
in each directory. I will explain the __init__.py
file with the example below:
21.1.2. Set Up¶
from setuptools import setup, find_packages
try:
with open("README.md") as f:
long_description = f.read()
except IOError:
long_description = ""
try:
with open("requirements.txt") as f:
requirements = [x.strip() for x in f.read().splitlines() if x.strip()]
except IOError:
requirements = []
setup(name='PySParkTools',
install_requires=requirements,
version='1.0',
description='Python Spark Tools',
author='Wenqiang Feng',
author_email='von198@gmail.com',
url='https://github.com/runawayhorse001/PySparkTools',
packages=find_packages(),
long_description=long_description
)
21.1.3. ReadMe¶
# PySparkTools
This is my PySpark Tools. If you want to colne and install it, you can use
- clone
```{bash}
git clone git@github.com:runawayhorse001/PySparkTools.git
```
- install
```{bash}
cd PySparkTools
pip install -r requirements.txt
python setup.py install
```
- test
```{bash}
cd PySparkTools/test
python test1.py
```
21.2. Pacakge Publishing on PyPI¶
21.2.1. Install twine
¶
pip install twine
21.2.2. Build Your Package¶
python setup.py sdist bdist_wheel
Then you will get a new folder dist
:
.
├── PySparkAudit-1.0.0-py2.7.egg
├── PySparkAudit-1.0.0-py2-none-any.whl
└── PySparkAudit-1.0.0.tar.gz
21.2.3. Upload Your Package¶
twine upload dist/*
During the uploading processing, you need to provide your PyPI account username
and password
:
Enter your username: runawayhorse001
Enter your password: ***************
21.2.4. Package at PyPI¶
Here is my PySparkAudit
package at [PyPI](https://pypi.org/project/PySparkAudit). You can install PySparkAudit
using:
pip install PySparkAudit