Refactor: V2.1.0

This commit is contained in:
PSNAppZ 2022-08-28 16:22:05 +05:30
parent 64238d6d3c
commit 8479ba5ac2
38 changed files with 371 additions and 945 deletions

3
.gitignore vendored
View File

@ -21,7 +21,7 @@ __pycache*
__pycache__/
# Misc
torBot
.*.swp
.ropeproject/
.idea/
@ -33,3 +33,4 @@ venv/
.DS_Store
.env
data/*.csv
torbot/modules/nlp/training_data/

View File

@ -2,6 +2,37 @@
--------------------
All notable changes to this project will be documented in this file.
## 2.1.0
### Added
* GoTor API - A Golang implementation of Core TorBot functionality.
* Phone number extractor - Extracts phone numbers from urls.
* Integrated NLP module with TorBot
* Major code refactoring
### Removed
* No longer using the tree module
* Poetry Implementation removed
## 2.0.0
### Added
* Fix data collection and add progress indicator by @KingAkeem in #192
* convert port to integer by @KingAkeem in #193
* Use hiddenwiki.org as default URL for collecting data by @KingAkeem in #194
* Bump jinja2 from 2.11.2 to 2.11.3 in /src/api by @dependabot in #200
* Simplify LinkNode and add new display by @KingAkeem in #202
* Remove live flag by @KingAkeem in #203
* Poetry Implementation by @NeoLight1010 in #206
* Delete .DS_Store by @stefins in #204
* Fix the basic functionality of tree features by @KingAkeem in #214
* Save results as json by @KingAkeem in #215
* Organize data file location by @KingAkeem in #216
* Add CodeTriage link and image by @KingAkeem in #213
* Add website classification by @KingAkeem in #218
* Use GoTor HTTP service by @KingAkeem in #219
## 1.4.0 | Present
### Added

40
CITATION.cff Normal file
View File

@ -0,0 +1,40 @@
# @InProceedings{10.1007/978-981-15-0146-3_19,
# author="Narayanan, P. S.
# and Ani, R.
# and King, Akeem T. L.",
# editor="Ranganathan, G.
# and Chen, Joy
# and Rocha, {\'A}lvaro",
# title="TorBot: Open Source Intelligence Tool for Dark Web",
# booktitle="Inventive Communication and Computational Technologies",
# year="2020",
# publisher="Springer Singapore",
# address="Singapore",
# pages="187--195",
# abstract="The dark web has turned into a dominant source of illegal activities. With several volunteered networks, it is becoming more difficult to track down these services. Open source intelligence (OSINT) is a technique used to gather intelligence on targets by harvesting publicly available data. Performing OSINT on the Tor network makes it a challenge for both researchers and developers because of the complexity and anonymity of the network. This paper presents a tool which shows OSINT in the dark web. With the use of this tool, researchers and Law Enforcement Agencies can automate their task of crawling and identifying different services in the Tor network. This tool has several features which can help extract different intelligence.",
# isbn="978-981-15-0146-3"
# }
cff-version: 1.2.0
message: "If you use this software, please cite the following paper:"
authors:
- family-names: P. S.
given-names: Narayanan
affiliation: Department of Computer Science and Applications, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, India
- family-names: Akeem T. L.
given-names: King
affiliation: USPA Technologies
- family-names: R
given-names: Ani
affiliation: Department of Computer Science and Applications, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, India
keywords:
- tor
- research
- osint
identifiers:
- type: doi
value: 10.1007/978-981-15-0146-3_19
license: GNU Public License
reposiory-code: https://github.com/DedSecInside/TorBot
title: TorBot - Open Source Intelligence Tool for Dark Web
date-released: 2020-01-30

View File

@ -45,6 +45,7 @@ If its a new module, it should be put inside the modules directory.
The branch name should be your new feature name in the format <Feature_featurename_version(optional)>. For example, <i>Feature_FasterCrawl_1.0</i>.
Contributor name will be updated to the below list. 😀
<br>
<b> NOTE : The PR should be made only to `dev` branch of TorBot. </b>
### OS Dependencies
@ -54,20 +55,37 @@ Contributor name will be updated to the below list. 😀
### Python Dependencies
(see pyproject.toml for more detail)
- beautifulsoup4
- pyinstaller
- PySocks
- termcolor
- requests
- requests_mock
- yattag
- numpy
(see requirements.txt for more details)
altgraph==0.17.2
beautifulsoup4==4.11.1
certifi==2022.5.18.1
charset-normalizer==2.0.12
decorator==5.1.1
ete3==3.1.2
idna==3.3
macholib==1.16
numpy==1.22.4
progress==1.6
pyinstaller==5.1
pyinstaller-hooks-contrib==2022.7
PySocks==1.7.1
python-dotenv==0.20.0
requests==2.28.0
requests-mock==1.9.3
six==1.16.0
soupsieve==2.3.2.post1
termcolor==1.1.0
threadsafe==1.0.0
urllib3==1.26.9
validators==0.20.0
yattag==1.14.0
pyqt5==5.15.6 (Install using apt/brew if pip installation fails.)
### Golang Dependencies
- https://github.com/KingAkeem/gotor (This service needs to be ran in tandem with TorBot)
## Basic setup
## Installation
### From source
Before you run the torBot make sure the following things are done properly:
* Run tor service
@ -75,22 +93,14 @@ Before you run the torBot make sure the following things are done properly:
* Make sure that your torrc is configured to SOCKS_PORT localhost:9050
* Install [Poetry](https://python-poetry.org/docs/)
* Open a new terminal and run `cd gotor && go run main.go -server`
* Disable Poetry virtualenvs (not required)
`poetry config settings.virtualenvs.create false`
* Install TorBot Python requirements using
`pip install -r requirements.txt`
* Install TorBot Python requirements
`poetry install`
Finally run the following command
On Linux platforms, you can make an executable for TorBot by using the install.sh script.
You will need to give the script the correct permissions using `chmod +x install.sh`
Now you can run `./install.sh` to create the torBot binary.
Run `./torBot` to execute the program.
An alternative way of running torBot is shown below, along with help instructions.
`python3 torBot.py or use the -h/--help argument`
`python3 run.py -h`
<pre>
usage: torBot.py [-h] [-v] [--update] [-q] [-u URL] [-s] [-m] [-e EXTENSION]
[-i]
@ -113,11 +123,7 @@ optional arguments:
Read more about torrc here : [Torrc](https://github.com/DedSecInside/TorBoT/blob/master/Tor.md)
#### Using the GUI
#### Using Docker
### Using Docker
- Ensure than you have a tor container running on port 9050.
- Build the image using following command (in the root directory):
@ -127,6 +133,14 @@ Read more about torrc here : [Torrc](https://github.com/DedSecInside/TorBoT/blob
`docker run --link tor:tor --rm -ti dedsecinside/torbot`
### Using executable (Linux Only)
On Linux platforms, you can make an executable for TorBot by using the install.sh script.
You will need to give the script the correct permissions using `chmod +x install.sh`
Now you can run `./install.sh` to create the torBot binary.
Run `./torBot` to execute the program.
## TO-DO
- [x] Visualization Module
- [x] Implement BFS Search for webcrawler
@ -140,27 +154,8 @@ Read more about torrc here : [Torrc](https://github.com/DedSecInside/TorBoT/blob
- [x] Increase efficiency
### Have ideas?
If you have new ideas which is worth implementing, mention those by starting a new issue with the title [FEATURE_REQUEST].
If the idea is worth implementing, congratz, you are now a contributor.
If you have new ideas which is worth implementing, mention those by creating a new issue with the title [FEATURE_REQUEST].
### Cite this [paper](https://link.springer.com/chapter/10.1007/978-981-15-0146-3_19)
@InProceedings{10.1007/978-981-15-0146-3_19,
author="Narayanan, P. S.
and Ani, R.
and King, Akeem T. L.",
editor="Ranganathan, G.
and Chen, Joy
and Rocha, {\'A}lvaro",
title="TorBot: Open Source Intelligence Tool for Dark Web",
booktitle="Inventive Communication and Computational Technologies",
year="2020",
publisher="Springer Singapore",
address="Singapore",
pages="187--195",
abstract="The dark web has turned into a dominant source of illegal activities. With several volunteered networks, it is becoming more difficult to track down these services. Open source intelligence (OSINT) is a technique used to gather intelligence on targets by harvesting publicly available data. Performing OSINT on the Tor network makes it a challenge for both researchers and developers because of the complexity and anonymity of the network. This paper presents a tool which shows OSINT in the dark web. With the use of this tool, researchers and Law Enforcement Agencies can automate their task of crawling and identifying different services in the Tor network. This tool has several features which can help extract different intelligence.",
isbn="978-981-15-0146-3"
}
### References
@ -208,4 +203,6 @@ GNU Public License
- [X] [SubaruSama](https://github.com/SubaruSama) - New Contributor
- [X] [robly78746](https://github.com/robly78746) - New Contributor
... see all contributors here (https://github.com/DedSecInside/TorBot/graphs/contributors)

View File

@ -1,9 +1,10 @@
FROM python:3
FROM python:3.9
LABEL maintainer="dedsec_inside"
# Install PyQt5
RUN apt-get update \
&& apt-get install -y --no-install-recommends python3-pyqt5 \
&& apt-get install -y virtualenv \
&& apt-get install -y tor \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
@ -11,11 +12,13 @@ WORKDIR /app
COPY . .
RUN pip install --no-cache-dir poetry
RUN poetry config virtualenvs.create false
RUN python -m poetry install --no-dev
# Create virtual env
RUN virtualenv venv --python=python3.9
RUN source venv/bin/activate
RUN pip install -r requirements.txt
RUN chmod +x install.sh
RUN bash install.sh
ENTRYPOINT ["./torBot", "--ip", "tor"]
ENTRYPOINT ["./run.py", "--ip", "tor"]

2
gotor

@ -1 +1 @@
Subproject commit 846ca59b808381a118f2a4bf82670e50ea86e335
Subproject commit d1239470b9847f22dcfde5eeb377e7af2f9ad244

View File

@ -7,7 +7,7 @@ mkdir -p tmp_dist
pip install pyinstaller
# Creates executable file and sends dependences to the recently created directories
pyinstaller --onefile --workpath ./tmp_build --distpath ./tmp_dist --paths=src src/torBot.py
pyinstaller --onefile --workpath ./tmp_build --distpath ./tmp_dist --paths=src torbot/main.py
# Puts the executable in the current directory
mv tmp_dist/torBot .

718
poetry.lock generated
View File

@ -1,718 +0,0 @@
[[package]]
name = "altgraph"
version = "0.17"
description = "Python graph (network) package"
category = "main"
optional = false
python-versions = "*"
[[package]]
name = "atomicwrites"
version = "1.4.0"
description = "Atomic file writes."
category = "dev"
optional = false
python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*"
[[package]]
name = "attrs"
version = "20.3.0"
description = "Classes Without Boilerplate"
category = "dev"
optional = false
python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*"
[package.extras]
dev = ["coverage[toml] (>=5.0.2)", "hypothesis", "pympler", "pytest (>=4.3.0)", "six", "zope.interface", "furo", "sphinx", "pre-commit"]
docs = ["furo", "sphinx", "zope.interface"]
tests = ["coverage[toml] (>=5.0.2)", "hypothesis", "pympler", "pytest (>=4.3.0)", "six", "zope.interface"]
tests_no_zope = ["coverage[toml] (>=5.0.2)", "hypothesis", "pympler", "pytest (>=4.3.0)", "six"]
[[package]]
name = "beautifulsoup4"
version = "4.9.3"
description = "Screen-scraping library"
category = "main"
optional = false
python-versions = "*"
[package.dependencies]
soupsieve = {version = ">1.2", markers = "python_version >= \"3.0\""}
[package.extras]
html5lib = ["html5lib"]
lxml = ["lxml"]
[[package]]
name = "certifi"
version = "2020.12.5"
description = "Python package for providing Mozilla's CA Bundle."
category = "main"
optional = false
python-versions = "*"
[[package]]
name = "chardet"
version = "4.0.0"
description = "Universal encoding detector for Python 2 and 3"
category = "main"
optional = false
python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*"
[[package]]
name = "colorama"
version = "0.4.4"
description = "Cross-platform colored terminal text."
category = "dev"
optional = false
python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*"
[[package]]
name = "decorator"
version = "5.0.5"
description = "Decorators for Humans"
category = "main"
optional = false
python-versions = ">=3.5"
[[package]]
name = "dis3"
version = "0.1.3"
description = "Python 2.7 backport of the \"dis\" module from Python 3.5+"
category = "main"
optional = false
python-versions = "*"
[[package]]
name = "ete3"
version = "3.1.2"
description = "A Python Environment for (phylogenetic) Tree Exploration"
category = "main"
optional = false
python-versions = "*"
[[package]]
name = "idna"
version = "2.10"
description = "Internationalized Domain Names in Applications (IDNA)"
category = "main"
optional = false
python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*"
[[package]]
name = "importlib-metadata"
version = "3.10.0"
description = "Read metadata from Python packages"
category = "dev"
optional = false
python-versions = ">=3.6"
[package.dependencies]
typing-extensions = {version = ">=3.6.4", markers = "python_version < \"3.8\""}
zipp = ">=0.5"
[package.extras]
docs = ["sphinx", "jaraco.packaging (>=8.2)", "rst.linker (>=1.9)"]
testing = ["pytest (>=4.6)", "pytest-checkdocs (>=2.4)", "pytest-flake8", "pytest-cov", "pytest-enabler (>=1.0.1)", "packaging", "pep517", "pyfakefs", "flufl.flake8", "pytest-black (>=0.3.7)", "pytest-mypy", "importlib-resources (>=1.3)"]
[[package]]
name = "iniconfig"
version = "1.1.1"
description = "iniconfig: brain-dead simple config-ini parsing"
category = "dev"
optional = false
python-versions = "*"
[[package]]
name = "joblib"
version = "1.0.1"
description = "Lightweight pipelining with Python functions"
category = "main"
optional = false
python-versions = ">=3.6"
[[package]]
name = "numpy"
version = "1.20.2"
description = "NumPy is the fundamental package for array computing with Python."
category = "main"
optional = false
python-versions = ">=3.7"
[[package]]
name = "packaging"
version = "20.9"
description = "Core utilities for Python packages"
category = "dev"
optional = false
python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*"
[package.dependencies]
pyparsing = ">=2.0.2"
[[package]]
name = "pluggy"
version = "0.13.1"
description = "plugin and hook calling mechanisms for python"
category = "dev"
optional = false
python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*"
[package.dependencies]
importlib-metadata = {version = ">=0.12", markers = "python_version < \"3.8\""}
[package.extras]
dev = ["pre-commit", "tox"]
[[package]]
name = "progress"
version = "1.5"
description = "Easy to use progress bars"
category = "main"
optional = false
python-versions = "*"
[[package]]
name = "py"
version = "1.10.0"
description = "library with cross-python path, ini-parsing, io, code, log facilities"
category = "dev"
optional = false
python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*"
[[package]]
name = "pyinstaller"
version = "3.6"
description = "PyInstaller bundles a Python application and all its dependencies into a single package."
category = "main"
optional = false
python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*"
[package.dependencies]
altgraph = "*"
dis3 = "*"
[[package]]
name = "pyparsing"
version = "2.4.7"
description = "Python parsing module"
category = "dev"
optional = false
python-versions = ">=2.6, !=3.0.*, !=3.1.*, !=3.2.*"
[[package]]
name = "pyqt5"
version = "5.15.4"
description = "Python bindings for the Qt cross platform application toolkit"
category = "main"
optional = false
python-versions = ">=3.6"
[package.dependencies]
PyQt5-Qt5 = ">=5.15"
PyQt5-sip = ">=12.8,<13"
[[package]]
name = "pyqt5-qt5"
version = "5.15.2"
description = "The subset of a Qt installation needed by PyQt5."
category = "main"
optional = false
python-versions = "*"
[[package]]
name = "pyqt5-sip"
version = "12.8.1"
description = "The sip module support for PyQt5"
category = "main"
optional = false
python-versions = ">=3.5"
[[package]]
name = "pysocks"
version = "1.7.1"
description = "A Python SOCKS client module. See https://github.com/Anorov/PySocks for more information."
category = "main"
optional = false
python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*"
[[package]]
name = "pytest"
version = "6.2.3"
description = "pytest: simple powerful testing with Python"
category = "dev"
optional = false
python-versions = ">=3.6"
[package.dependencies]
atomicwrites = {version = ">=1.0", markers = "sys_platform == \"win32\""}
attrs = ">=19.2.0"
colorama = {version = "*", markers = "sys_platform == \"win32\""}
importlib-metadata = {version = ">=0.12", markers = "python_version < \"3.8\""}
iniconfig = "*"
packaging = "*"
pluggy = ">=0.12,<1.0.0a1"
py = ">=1.8.2"
toml = "*"
[package.extras]
testing = ["argcomplete", "hypothesis (>=3.56)", "mock", "nose", "requests", "xmlschema"]
[[package]]
name = "python-dotenv"
version = "0.10.5"
description = "Add .env support to your django/flask apps in development and deployments"
category = "main"
optional = false
python-versions = "*"
[package.extras]
cli = ["click (>=5.0)"]
[[package]]
name = "requests"
version = "2.25.1"
description = "Python HTTP for Humans."
category = "main"
optional = false
python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*"
[package.dependencies]
certifi = ">=2017.4.17"
chardet = ">=3.0.2,<5"
idna = ">=2.5,<3"
urllib3 = ">=1.21.1,<1.27"
[package.extras]
security = ["pyOpenSSL (>=0.14)", "cryptography (>=1.3.4)"]
socks = ["PySocks (>=1.5.6,!=1.5.7)", "win-inet-pton"]
[[package]]
name = "requests-mock"
version = "1.8.0"
description = "Mock out responses from the requests package"
category = "main"
optional = false
python-versions = "*"
[package.dependencies]
requests = ">=2.3,<3"
six = "*"
[package.extras]
fixture = ["fixtures"]
test = ["fixtures", "mock", "purl", "pytest", "sphinx", "testrepository (>=0.0.18)", "testtools"]
[[package]]
name = "scikit-learn"
version = "0.24.2"
description = "A set of python modules for machine learning and data mining"
category = "main"
optional = false
python-versions = ">=3.6"
[package.dependencies]
joblib = ">=0.11"
numpy = ">=1.13.3"
scipy = ">=0.19.1"
threadpoolctl = ">=2.0.0"
[package.extras]
benchmark = ["matplotlib (>=2.1.1)", "pandas (>=0.25.0)", "memory-profiler (>=0.57.0)"]
docs = ["matplotlib (>=2.1.1)", "scikit-image (>=0.13)", "pandas (>=0.25.0)", "seaborn (>=0.9.0)", "memory-profiler (>=0.57.0)", "sphinx (>=3.2.0)", "sphinx-gallery (>=0.7.0)", "numpydoc (>=1.0.0)", "Pillow (>=7.1.2)", "sphinx-prompt (>=1.3.0)"]
examples = ["matplotlib (>=2.1.1)", "scikit-image (>=0.13)", "pandas (>=0.25.0)", "seaborn (>=0.9.0)"]
tests = ["matplotlib (>=2.1.1)", "scikit-image (>=0.13)", "pandas (>=0.25.0)", "pytest (>=5.0.1)", "pytest-cov (>=2.9.0)", "flake8 (>=3.8.2)", "mypy (>=0.770)", "pyamg (>=4.0.0)"]
[[package]]
name = "scipy"
version = "1.6.1"
description = "SciPy: Scientific Library for Python"
category = "main"
optional = false
python-versions = ">=3.7"
[package.dependencies]
numpy = ">=1.16.5"
[[package]]
name = "six"
version = "1.15.0"
description = "Python 2 and 3 compatibility utilities"
category = "main"
optional = false
python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*"
[[package]]
name = "soupsieve"
version = "2.2.1"
description = "A modern CSS selector implementation for Beautiful Soup."
category = "main"
optional = false
python-versions = ">=3.6"
[[package]]
name = "termcolor"
version = "1.1.0"
description = "ANSII Color formatting for output in terminal."
category = "main"
optional = false
python-versions = "*"
[[package]]
name = "threadpoolctl"
version = "2.2.0"
description = "threadpoolctl"
category = "main"
optional = false
python-versions = ">=3.6"
[[package]]
name = "threadsafe"
version = "1.0.0"
description = "Thread-safe data structures"
category = "main"
optional = false
python-versions = "*"
[[package]]
name = "toml"
version = "0.10.2"
description = "Python Library for Tom's Obvious, Minimal Language"
category = "dev"
optional = false
python-versions = ">=2.6, !=3.0.*, !=3.1.*, !=3.2.*"
[[package]]
name = "typing-extensions"
version = "3.7.4.3"
description = "Backported and Experimental Type Hints for Python 3.5+"
category = "dev"
optional = false
python-versions = "*"
[[package]]
name = "urllib3"
version = "1.26.4"
description = "HTTP library with thread-safe connection pooling, file post, and more."
category = "main"
optional = false
python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, <4"
[package.extras]
secure = ["pyOpenSSL (>=0.14)", "cryptography (>=1.3.4)", "idna (>=2.0.0)", "certifi", "ipaddress"]
socks = ["PySocks (>=1.5.6,!=1.5.7,<2.0)"]
brotli = ["brotlipy (>=0.6.0)"]
[[package]]
name = "validators"
version = "0.12.6"
description = "Python Data Validation for Humans™."
category = "main"
optional = false
python-versions = "*"
[package.dependencies]
decorator = ">=3.4.0"
six = ">=1.4.0"
[package.extras]
test = ["pytest (>=2.2.3)", "flake8 (>=2.4.0)", "isort (>=4.2.2)"]
[[package]]
name = "yapf"
version = "0.31.0"
description = "A formatter for Python code."
category = "main"
optional = false
python-versions = "*"
[[package]]
name = "yattag"
version = "1.14.0"
description = "Generate HTML or XML in a pythonic way. Pure python alternative to web template engines.Can fill HTML forms with default values and error messages."
category = "main"
optional = false
python-versions = "*"
[[package]]
name = "zipp"
version = "3.4.1"
description = "Backport of pathlib-compatible object wrapper for zip files"
category = "dev"
optional = false
python-versions = ">=3.6"
[package.extras]
docs = ["sphinx", "jaraco.packaging (>=8.2)", "rst.linker (>=1.9)"]
testing = ["pytest (>=4.6)", "pytest-checkdocs (>=1.2.3)", "pytest-flake8", "pytest-cov", "pytest-enabler", "jaraco.itertools", "func-timeout", "pytest-black (>=0.3.7)", "pytest-mypy"]
[metadata]
lock-version = "1.1"
python-versions = "^3.7"
content-hash = "bb2b9eb9d7a7bfdc3555e76f8ad44fa6d4b9d7c5009de6a2a4b72120a2b405bc"
[metadata.files]
altgraph = [
{file = "altgraph-0.17-py2.py3-none-any.whl", hash = "sha256:c623e5f3408ca61d4016f23a681b9adb100802ca3e3da5e718915a9e4052cebe"},
{file = "altgraph-0.17.tar.gz", hash = "sha256:1f05a47122542f97028caf78775a095fbe6a2699b5089de8477eb583167d69aa"},
]
atomicwrites = [
{file = "atomicwrites-1.4.0-py2.py3-none-any.whl", hash = "sha256:6d1784dea7c0c8d4a5172b6c620f40b6e4cbfdf96d783691f2e1302a7b88e197"},
{file = "atomicwrites-1.4.0.tar.gz", hash = "sha256:ae70396ad1a434f9c7046fd2dd196fc04b12f9e91ffb859164193be8b6168a7a"},
]
attrs = [
{file = "attrs-20.3.0-py2.py3-none-any.whl", hash = "sha256:31b2eced602aa8423c2aea9c76a724617ed67cf9513173fd3a4f03e3a929c7e6"},
{file = "attrs-20.3.0.tar.gz", hash = "sha256:832aa3cde19744e49938b91fea06d69ecb9e649c93ba974535d08ad92164f700"},
]
beautifulsoup4 = [
{file = "beautifulsoup4-4.9.3-py2-none-any.whl", hash = "sha256:4c98143716ef1cb40bf7f39a8e3eec8f8b009509e74904ba3a7b315431577e35"},
{file = "beautifulsoup4-4.9.3-py3-none-any.whl", hash = "sha256:fff47e031e34ec82bf17e00da8f592fe7de69aeea38be00523c04623c04fb666"},
{file = "beautifulsoup4-4.9.3.tar.gz", hash = "sha256:84729e322ad1d5b4d25f805bfa05b902dd96450f43842c4e99067d5e1369eb25"},
]
certifi = [
{file = "certifi-2020.12.5-py2.py3-none-any.whl", hash = "sha256:719a74fb9e33b9bd44cc7f3a8d94bc35e4049deebe19ba7d8e108280cfd59830"},
{file = "certifi-2020.12.5.tar.gz", hash = "sha256:1a4995114262bffbc2413b159f2a1a480c969de6e6eb13ee966d470af86af59c"},
]
chardet = [
{file = "chardet-4.0.0-py2.py3-none-any.whl", hash = "sha256:f864054d66fd9118f2e67044ac8981a54775ec5b67aed0441892edb553d21da5"},
{file = "chardet-4.0.0.tar.gz", hash = "sha256:0d6f53a15db4120f2b08c94f11e7d93d2c911ee118b6b30a04ec3ee8310179fa"},
]
colorama = [
{file = "colorama-0.4.4-py2.py3-none-any.whl", hash = "sha256:9f47eda37229f68eee03b24b9748937c7dc3868f906e8ba69fbcbdd3bc5dc3e2"},
{file = "colorama-0.4.4.tar.gz", hash = "sha256:5941b2b48a20143d2267e95b1c2a7603ce057ee39fd88e7329b0c292aa16869b"},
]
decorator = [
{file = "decorator-5.0.5-py3-none-any.whl", hash = "sha256:b7157d62ea3c2c0c57b81a05e4569853e976a3dda5dd7a1cb86be78978c3c5f8"},
{file = "decorator-5.0.5.tar.gz", hash = "sha256:acda948ffcfe4bd0c4a57834b74ad968b91925b8201b740ca9d46fb8c5c618ce"},
]
dis3 = [
{file = "dis3-0.1.3-py2-none-any.whl", hash = "sha256:61f7720dd0d8749d23fda3d7227ce74d73da11c2fade993a67ab2f9852451b14"},
{file = "dis3-0.1.3-py3-none-any.whl", hash = "sha256:30b6412d33d738663e8ded781b138f4b01116437f0872aa56aa3adba6aeff218"},
{file = "dis3-0.1.3.tar.gz", hash = "sha256:9259b881fc1df02ed12ac25f82d4a85b44241854330b1a651e40e0c675cb2d1e"},
]
ete3 = [
{file = "ete3-3.1.2.tar.gz", hash = "sha256:4fc987b8c529889d6608fab1101f1455cb5cbd42722788de6aea9c7d0a8e59e9"},
]
idna = [
{file = "idna-2.10-py2.py3-none-any.whl", hash = "sha256:b97d804b1e9b523befed77c48dacec60e6dcb0b5391d57af6a65a312a90648c0"},
{file = "idna-2.10.tar.gz", hash = "sha256:b307872f855b18632ce0c21c5e45be78c0ea7ae4c15c828c20788b26921eb3f6"},
]
importlib-metadata = [
{file = "importlib_metadata-3.10.0-py3-none-any.whl", hash = "sha256:d2d46ef77ffc85cbf7dac7e81dd663fde71c45326131bea8033b9bad42268ebe"},
{file = "importlib_metadata-3.10.0.tar.gz", hash = "sha256:c9db46394197244adf2f0b08ec5bc3cf16757e9590b02af1fca085c16c0d600a"},
]
iniconfig = [
{file = "iniconfig-1.1.1-py2.py3-none-any.whl", hash = "sha256:011e24c64b7f47f6ebd835bb12a743f2fbe9a26d4cecaa7f53bc4f35ee9da8b3"},
{file = "iniconfig-1.1.1.tar.gz", hash = "sha256:bc3af051d7d14b2ee5ef9969666def0cd1a000e121eaea580d4a313df4b37f32"},
]
joblib = [
{file = "joblib-1.0.1-py3-none-any.whl", hash = "sha256:feeb1ec69c4d45129954f1b7034954241eedfd6ba39b5e9e4b6883be3332d5e5"},
{file = "joblib-1.0.1.tar.gz", hash = "sha256:9c17567692206d2f3fb9ecf5e991084254fe631665c450b443761c4186a613f7"},
]
numpy = [
{file = "numpy-1.20.2-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:e9459f40244bb02b2f14f6af0cd0732791d72232bbb0dc4bab57ef88e75f6935"},
{file = "numpy-1.20.2-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:a8e6859913ec8eeef3dbe9aed3bf475347642d1cdd6217c30f28dee8903528e6"},
{file = "numpy-1.20.2-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:9cab23439eb1ebfed1aaec9cd42b7dc50fc96d5cd3147da348d9161f0501ada5"},
{file = "numpy-1.20.2-cp37-cp37m-manylinux2010_i686.whl", hash = "sha256:9c0fab855ae790ca74b27e55240fe4f2a36a364a3f1ebcfd1fb5ac4088f1cec3"},
{file = "numpy-1.20.2-cp37-cp37m-manylinux2010_x86_64.whl", hash = "sha256:61d5b4cf73622e4d0c6b83408a16631b670fc045afd6540679aa35591a17fe6d"},
{file = "numpy-1.20.2-cp37-cp37m-manylinux2014_aarch64.whl", hash = "sha256:d15007f857d6995db15195217afdbddfcd203dfaa0ba6878a2f580eaf810ecd6"},
{file = "numpy-1.20.2-cp37-cp37m-win32.whl", hash = "sha256:d76061ae5cab49b83a8cf3feacefc2053fac672728802ac137dd8c4123397677"},
{file = "numpy-1.20.2-cp37-cp37m-win_amd64.whl", hash = "sha256:bad70051de2c50b1a6259a6df1daaafe8c480ca98132da98976d8591c412e737"},
{file = "numpy-1.20.2-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:719656636c48be22c23641859ff2419b27b6bdf844b36a2447cb39caceb00935"},
{file = "numpy-1.20.2-cp38-cp38-manylinux1_i686.whl", hash = "sha256:aa046527c04688af680217fffac61eec2350ef3f3d7320c07fd33f5c6e7b4d5f"},
{file = "numpy-1.20.2-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:2428b109306075d89d21135bdd6b785f132a1f5a3260c371cee1fae427e12727"},
{file = "numpy-1.20.2-cp38-cp38-manylinux2010_i686.whl", hash = "sha256:e8e4fbbb7e7634f263c5b0150a629342cc19b47c5eba8d1cd4363ab3455ab576"},
{file = "numpy-1.20.2-cp38-cp38-manylinux2010_x86_64.whl", hash = "sha256:edb1f041a9146dcf02cd7df7187db46ab524b9af2515f392f337c7cbbf5b52cd"},
{file = "numpy-1.20.2-cp38-cp38-manylinux2014_aarch64.whl", hash = "sha256:c73a7975d77f15f7f68dacfb2bca3d3f479f158313642e8ea9058eea06637931"},
{file = "numpy-1.20.2-cp38-cp38-win32.whl", hash = "sha256:6c915ee7dba1071554e70a3664a839fbc033e1d6528199d4621eeaaa5487ccd2"},
{file = "numpy-1.20.2-cp38-cp38-win_amd64.whl", hash = "sha256:471c0571d0895c68da309dacee4e95a0811d0a9f9f532a48dc1bea5f3b7ad2b7"},
{file = "numpy-1.20.2-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:4703b9e937df83f5b6b7447ca5912b5f5f297aba45f91dbbbc63ff9278c7aa98"},
{file = "numpy-1.20.2-cp39-cp39-manylinux2010_i686.whl", hash = "sha256:abc81829c4039e7e4c30f7897938fa5d4916a09c2c7eb9b244b7a35ddc9656f4"},
{file = "numpy-1.20.2-cp39-cp39-manylinux2010_x86_64.whl", hash = "sha256:377751954da04d4a6950191b20539066b4e19e3b559d4695399c5e8e3e683bf6"},
{file = "numpy-1.20.2-cp39-cp39-manylinux2014_aarch64.whl", hash = "sha256:6e51e417d9ae2e7848314994e6fc3832c9d426abce9328cf7571eefceb43e6c9"},
{file = "numpy-1.20.2-cp39-cp39-win32.whl", hash = "sha256:780ae5284cb770ade51d4b4a7dce4faa554eb1d88a56d0e8b9f35fca9b0270ff"},
{file = "numpy-1.20.2-cp39-cp39-win_amd64.whl", hash = "sha256:924dc3f83de20437de95a73516f36e09918e9c9c18d5eac520062c49191025fb"},
{file = "numpy-1.20.2-pp37-pypy37_pp73-manylinux2010_x86_64.whl", hash = "sha256:97ce8b8ace7d3b9288d88177e66ee75480fb79b9cf745e91ecfe65d91a856042"},
{file = "numpy-1.20.2.zip", hash = "sha256:878922bf5ad7550aa044aa9301d417e2d3ae50f0f577de92051d739ac6096cee"},
]
packaging = [
{file = "packaging-20.9-py2.py3-none-any.whl", hash = "sha256:67714da7f7bc052e064859c05c595155bd1ee9f69f76557e21f051443c20947a"},
{file = "packaging-20.9.tar.gz", hash = "sha256:5b327ac1320dc863dca72f4514ecc086f31186744b84a230374cc1fd776feae5"},
]
pluggy = [
{file = "pluggy-0.13.1-py2.py3-none-any.whl", hash = "sha256:966c145cd83c96502c3c3868f50408687b38434af77734af1e9ca461a4081d2d"},
{file = "pluggy-0.13.1.tar.gz", hash = "sha256:15b2acde666561e1298d71b523007ed7364de07029219b604cf808bfa1c765b0"},
]
progress = [
{file = "progress-1.5.tar.gz", hash = "sha256:69ecedd1d1bbe71bf6313d88d1e6c4d2957b7f1d4f71312c211257f7dae64372"},
]
py = [
{file = "py-1.10.0-py2.py3-none-any.whl", hash = "sha256:3b80836aa6d1feeaa108e046da6423ab8f6ceda6468545ae8d02d9d58d18818a"},
{file = "py-1.10.0.tar.gz", hash = "sha256:21b81bda15b66ef5e1a777a21c4dcd9c20ad3efd0b3f817e7a809035269e1bd3"},
]
pyinstaller = [
{file = "PyInstaller-3.6.tar.gz", hash = "sha256:3730fa80d088f8bb7084d32480eb87cbb4ddb64123363763cf8f2a1378c1c4b7"},
]
pyparsing = [
{file = "pyparsing-2.4.7-py2.py3-none-any.whl", hash = "sha256:ef9d7589ef3c200abe66653d3f1ab1033c3c419ae9b9bdb1240a85b024efc88b"},
{file = "pyparsing-2.4.7.tar.gz", hash = "sha256:c203ec8783bf771a155b207279b9bccb8dea02d8f0c9e5f8ead507bc3246ecc1"},
]
pyqt5 = [
{file = "PyQt5-5.15.4-cp36.cp37.cp38.cp39-abi3-macosx_10_13_intel.whl", hash = "sha256:8c0848ba790a895801d5bfd171da31cad3e551dbcc4e59677a3b622de2ceca98"},
{file = "PyQt5-5.15.4-cp36.cp37.cp38.cp39-abi3-manylinux2014_x86_64.whl", hash = "sha256:883a549382fc22d29a0568f3ef20b38c8e7ab633a59498ac4eb63a3bf36d3fd3"},
{file = "PyQt5-5.15.4-cp36.cp37.cp38.cp39-none-win32.whl", hash = "sha256:a88526a271e846e44779bb9ad7a738c6d3c4a9d01e15a128ecfc6dd4696393b7"},
{file = "PyQt5-5.15.4-cp36.cp37.cp38.cp39-none-win_amd64.whl", hash = "sha256:213bebd51821ed89b4d5b35bb10dbe67564228b3568f463a351a08e8b1677025"},
{file = "PyQt5-5.15.4.tar.gz", hash = "sha256:2a69597e0dd11caabe75fae133feca66387819fc9bc050f547e5551bce97e5be"},
]
pyqt5-qt5 = [
{file = "PyQt5_Qt5-5.15.2-py3-none-macosx_10_13_intel.whl", hash = "sha256:76980cd3d7ae87e3c7a33bfebfaee84448fd650bad6840471d6cae199b56e154"},
{file = "PyQt5_Qt5-5.15.2-py3-none-manylinux2014_x86_64.whl", hash = "sha256:1988f364ec8caf87a6ee5d5a3a5210d57539988bf8e84714c7d60972692e2f4a"},
{file = "PyQt5_Qt5-5.15.2-py3-none-win32.whl", hash = "sha256:9cc7a768b1921f4b982ebc00a318ccb38578e44e45316c7a4a850e953e1dd327"},
{file = "PyQt5_Qt5-5.15.2-py3-none-win_amd64.whl", hash = "sha256:750b78e4dba6bdf1607febedc08738e318ea09e9b10aea9ff0d73073f11f6962"},
]
pyqt5-sip = [
{file = "PyQt5_sip-12.8.1-cp35-cp35m-macosx_10_6_intel.whl", hash = "sha256:bb5a87b66fc1445915104ee97f7a20a69decb42f52803e3b0795fa17ff88226c"},
{file = "PyQt5_sip-12.8.1-cp35-cp35m-manylinux1_x86_64.whl", hash = "sha256:a29e2ac399429d3b7738f73e9081e50783e61ac5d29344e0802d0dcd6056c5a2"},
{file = "PyQt5_sip-12.8.1-cp35-cp35m-win32.whl", hash = "sha256:0304ca9114b9817a270f67f421355075b78ff9fc25ac58ffd72c2601109d2194"},
{file = "PyQt5_sip-12.8.1-cp35-cp35m-win_amd64.whl", hash = "sha256:84ba7746762bd223bed22428e8561aa267a229c28344c2d28c5d5d3f8970cffb"},
{file = "PyQt5_sip-12.8.1-cp36-cp36m-macosx_10_6_intel.whl", hash = "sha256:7b81382ce188d63890a0e35abe0f9bb946cabc873a31873b73583b0fc84ac115"},
{file = "PyQt5_sip-12.8.1-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:b6d42250baec52a5f77de64e2951d001c5501c3a2df2179f625b241cbaec3369"},
{file = "PyQt5_sip-12.8.1-cp36-cp36m-win32.whl", hash = "sha256:6c1ebee60f1d2b3c70aff866b7933d8d8d7646011f7c32f9321ee88c290aa4f9"},
{file = "PyQt5_sip-12.8.1-cp36-cp36m-win_amd64.whl", hash = "sha256:34dcd29be47553d5f016ff86e89e24cbc5eebae92eb2f96fb32d2d7ba028c43c"},
{file = "PyQt5_sip-12.8.1-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:ed897c58acf4a3cdca61469daa31fe6e44c33c6c06a37c3f21fab31780b3b86a"},
{file = "PyQt5_sip-12.8.1-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:a1b8ef013086e224b8e86c93f880f776d01b59195bdfa2a8e0b23f0480678fec"},
{file = "PyQt5_sip-12.8.1-cp37-cp37m-win32.whl", hash = "sha256:0cd969be528c27bbd4755bd323dff4a79a8fdda28215364e6ce3e069cb56c2a9"},
{file = "PyQt5_sip-12.8.1-cp37-cp37m-win_amd64.whl", hash = "sha256:c9800729badcb247765e4ffe2241549d02da1fa435b9db224845bc37c3e99cb0"},
{file = "PyQt5_sip-12.8.1-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:9312ec47cac4e33c11503bc1cbeeb0bdae619620472f38e2078c5a51020a930f"},
{file = "PyQt5_sip-12.8.1-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:2f35e82fd7ec1e1f6716e9154721c7594956a4f5bd4f826d8c6a6453833cc2f0"},
{file = "PyQt5_sip-12.8.1-cp38-cp38-win32.whl", hash = "sha256:da9c9f1e65b9d09e73bd75befc82961b6b61b5a3b9d0a7c832168e1415f163c6"},
{file = "PyQt5_sip-12.8.1-cp38-cp38-win_amd64.whl", hash = "sha256:832fd60a264de4134c2824d393320838f3ab648180c9c357ec58a74524d24507"},
{file = "PyQt5_sip-12.8.1-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:c317ab1263e6417c498b81f5c970a9b1af7acefab1f80b4cc0f2f8e661f29fc5"},
{file = "PyQt5_sip-12.8.1-cp39-cp39-manylinux1_x86_64.whl", hash = "sha256:c9d6d448c29dc6606bb7974696608f81f4316c8234f7c7216396ed110075e777"},
{file = "PyQt5_sip-12.8.1-cp39-cp39-win32.whl", hash = "sha256:5a011aeff89660622a6d5c3388d55a9d76932f3b82c95e82fc31abd8b1d2990d"},
{file = "PyQt5_sip-12.8.1-cp39-cp39-win_amd64.whl", hash = "sha256:f168f0a7f32b81bfeffdf003c36f25d81c97dee5eb67072a5183e761fe250f13"},
{file = "PyQt5_sip-12.8.1.tar.gz", hash = "sha256:30e944db9abee9cc757aea16906d4198129558533eb7fadbe48c5da2bd18e0bd"},
]
pysocks = [
{file = "PySocks-1.7.1-py27-none-any.whl", hash = "sha256:08e69f092cc6dbe92a0fdd16eeb9b9ffbc13cadfe5ca4c7bd92ffb078b293299"},
{file = "PySocks-1.7.1-py3-none-any.whl", hash = "sha256:2725bd0a9925919b9b51739eea5f9e2bae91e83288108a9ad338b2e3a4435ee5"},
{file = "PySocks-1.7.1.tar.gz", hash = "sha256:3f8804571ebe159c380ac6de37643bb4685970655d3bba243530d6558b799aa0"},
]
pytest = [
{file = "pytest-6.2.3-py3-none-any.whl", hash = "sha256:6ad9c7bdf517a808242b998ac20063c41532a570d088d77eec1ee12b0b5574bc"},
{file = "pytest-6.2.3.tar.gz", hash = "sha256:671238a46e4df0f3498d1c3270e5deb9b32d25134c99b7d75370a68cfbe9b634"},
]
python-dotenv = [
{file = "python-dotenv-0.10.5.tar.gz", hash = "sha256:f254bfd0c970d64ccbb6c9ebef3667ab301a71473569c991253a481f1c98dddc"},
{file = "python_dotenv-0.10.5-py2.py3-none-any.whl", hash = "sha256:440c7c23d53b7d352f9c94d6f70860242c2f071cf5c029dd661ccb22d64ae42b"},
]
requests = [
{file = "requests-2.25.1-py2.py3-none-any.whl", hash = "sha256:c210084e36a42ae6b9219e00e48287def368a26d03a048ddad7bfee44f75871e"},
{file = "requests-2.25.1.tar.gz", hash = "sha256:27973dd4a904a4f13b263a19c866c13b92a39ed1c964655f025f3f8d3d75b804"},
]
requests-mock = [
{file = "requests-mock-1.8.0.tar.gz", hash = "sha256:e68f46844e4cee9d447150343c9ae875f99fa8037c6dcf5f15bf1fe9ab43d226"},
{file = "requests_mock-1.8.0-py2.py3-none-any.whl", hash = "sha256:11215c6f4df72702aa357f205cf1e537cffd7392b3e787b58239bde5fb3db53b"},
]
scikit-learn = [
{file = "scikit-learn-0.24.2.tar.gz", hash = "sha256:d14701a12417930392cd3898e9646cf5670c190b933625ebe7511b1f7d7b8736"},
{file = "scikit_learn-0.24.2-cp36-cp36m-macosx_10_13_x86_64.whl", hash = "sha256:d5bf9c863ba4717b3917b5227463ee06860fc43931dc9026747de416c0a10fee"},
{file = "scikit_learn-0.24.2-cp36-cp36m-manylinux1_i686.whl", hash = "sha256:5beaeb091071625e83f5905192d8aecde65ba2f26f8b6719845bbf586f7a04a1"},
{file = "scikit_learn-0.24.2-cp36-cp36m-manylinux1_x86_64.whl", hash = "sha256:06ffdcaaf81e2a3b1b50c3ac6842cfb13df2d8b737d61f64643ed61da7389cde"},
{file = "scikit_learn-0.24.2-cp36-cp36m-manylinux2010_i686.whl", hash = "sha256:fec42690a2eb646b384eafb021c425fab48991587edb412d4db77acc358b27ce"},
{file = "scikit_learn-0.24.2-cp36-cp36m-manylinux2010_x86_64.whl", hash = "sha256:5ff3e4e4cf7592d36541edec434e09fb8ab9ba6b47608c4ffe30c9038d301897"},
{file = "scikit_learn-0.24.2-cp36-cp36m-manylinux2014_aarch64.whl", hash = "sha256:3cbd734e1aefc7c5080e6b6973fe062f97c26a1cdf1a991037ca196ce1c8f427"},
{file = "scikit_learn-0.24.2-cp36-cp36m-win32.whl", hash = "sha256:f74429a07fedb36a03c159332b914e6de757176064f9fed94b5f79ebac07d913"},
{file = "scikit_learn-0.24.2-cp36-cp36m-win_amd64.whl", hash = "sha256:dd968a174aa82f3341a615a033fa6a8169e9320cbb46130686562db132d7f1f0"},
{file = "scikit_learn-0.24.2-cp37-cp37m-macosx_10_13_x86_64.whl", hash = "sha256:49ec0b1361da328da9bb7f1a162836028e72556356adeb53342f8fae6b450d47"},
{file = "scikit_learn-0.24.2-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:f18c3ed484eeeaa43a0d45dc2efb4d00fc6542ccdcfa2c45d7b635096a2ae534"},
{file = "scikit_learn-0.24.2-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:cdf24c1b9bbeb4936456b42ac5bd32c60bb194a344951acb6bfb0cddee5439a4"},
{file = "scikit_learn-0.24.2-cp37-cp37m-manylinux2010_i686.whl", hash = "sha256:d177fe1ff47cc235942d628d41ee5b1c6930d8f009f1a451c39b5411e8d0d4cf"},
{file = "scikit_learn-0.24.2-cp37-cp37m-manylinux2010_x86_64.whl", hash = "sha256:f3ec00f023d84526381ad0c0f2cff982852d035c921bbf8ceb994f4886c00c64"},
{file = "scikit_learn-0.24.2-cp37-cp37m-manylinux2014_aarch64.whl", hash = "sha256:ae19ac105cf7ce8c205a46166992fdec88081d6e783ab6e38ecfbe45729f3c39"},
{file = "scikit_learn-0.24.2-cp37-cp37m-win32.whl", hash = "sha256:f0ed4483c258fb23150e31b91ea7d25ff8495dba108aea0b0d4206a777705350"},
{file = "scikit_learn-0.24.2-cp37-cp37m-win_amd64.whl", hash = "sha256:39b7e3b71bcb1fe46397185d6c1a5db1c441e71c23c91a31e7ad8cc3f7305f9a"},
{file = "scikit_learn-0.24.2-cp38-cp38-macosx_10_13_x86_64.whl", hash = "sha256:90a297330f608adeb4d2e9786c6fda395d3150739deb3d42a86d9a4c2d15bc1d"},
{file = "scikit_learn-0.24.2-cp38-cp38-manylinux1_i686.whl", hash = "sha256:f1d2108e770907540b5248977e4cff9ffaf0f73d0d13445ee938df06ca7579c6"},
{file = "scikit_learn-0.24.2-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:1eec963fe9ffc827442c2e9333227c4d49749a44e592f305398c1db5c1563393"},
{file = "scikit_learn-0.24.2-cp38-cp38-manylinux2010_i686.whl", hash = "sha256:2db429090b98045d71218a9ba913cc9b3fe78e0ba0b6b647d8748bc6d5a44080"},
{file = "scikit_learn-0.24.2-cp38-cp38-manylinux2010_x86_64.whl", hash = "sha256:62214d2954377fcf3f31ec867dd4e436df80121e7a32947a0b3244f58f45e455"},
{file = "scikit_learn-0.24.2-cp38-cp38-manylinux2014_aarch64.whl", hash = "sha256:8fac72b9688176922f9f54fda1ba5f7ffd28cbeb9aad282760186e8ceba9139a"},
{file = "scikit_learn-0.24.2-cp38-cp38-win32.whl", hash = "sha256:ae426e3a52842c6b6d77d00f906b6031c8c2cfdfabd6af7511bb4bc9a68d720e"},
{file = "scikit_learn-0.24.2-cp38-cp38-win_amd64.whl", hash = "sha256:038f4e9d6ef10e1f3fe82addc3a14735c299866eb10f2c77c090410904828312"},
{file = "scikit_learn-0.24.2-cp39-cp39-macosx_10_13_x86_64.whl", hash = "sha256:48f273836e19901ba2beecd919f7b352f09310ce67c762f6e53bc6b81cacf1f0"},
{file = "scikit_learn-0.24.2-cp39-cp39-manylinux1_i686.whl", hash = "sha256:a2a47449093dcf70babc930beba2ca0423cb7df2fa5fd76be5260703d67fa574"},
{file = "scikit_learn-0.24.2-cp39-cp39-manylinux1_x86_64.whl", hash = "sha256:0e71ce9c7cbc20f6f8b860107ce15114da26e8675238b4b82b7e7cd37ca0c087"},
{file = "scikit_learn-0.24.2-cp39-cp39-manylinux2010_i686.whl", hash = "sha256:2754c85b2287333f9719db7f23fb7e357f436deed512db3417a02bf6f2830aa5"},
{file = "scikit_learn-0.24.2-cp39-cp39-manylinux2010_x86_64.whl", hash = "sha256:7be1b88c23cfac46e06404582215a917017cd2edaa2e4d40abe6aaff5458f24b"},
{file = "scikit_learn-0.24.2-cp39-cp39-manylinux2014_aarch64.whl", hash = "sha256:4e6198675a6f9d333774671bd536668680eea78e2e81c0b19e57224f58d17f37"},
{file = "scikit_learn-0.24.2-cp39-cp39-win32.whl", hash = "sha256:cbdb0b3db99dd1d5f69d31b4234367d55475add31df4d84a3bd690ef017b55e2"},
{file = "scikit_learn-0.24.2-cp39-cp39-win_amd64.whl", hash = "sha256:40556bea1ef26ef54bc678d00cf138a63069144a0b5f3a436eecd8f3468b903e"},
]
scipy = [
{file = "scipy-1.6.1-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:a15a1f3fc0abff33e792d6049161b7795909b40b97c6cc2934ed54384017ab76"},
{file = "scipy-1.6.1-cp37-cp37m-manylinux1_i686.whl", hash = "sha256:e79570979ccdc3d165456dd62041d9556fb9733b86b4b6d818af7a0afc15f092"},
{file = "scipy-1.6.1-cp37-cp37m-manylinux1_x86_64.whl", hash = "sha256:a423533c55fec61456dedee7b6ee7dce0bb6bfa395424ea374d25afa262be261"},
{file = "scipy-1.6.1-cp37-cp37m-manylinux2014_aarch64.whl", hash = "sha256:33d6b7df40d197bdd3049d64e8e680227151673465e5d85723b3b8f6b15a6ced"},
{file = "scipy-1.6.1-cp37-cp37m-win32.whl", hash = "sha256:6725e3fbb47da428794f243864f2297462e9ee448297c93ed1dcbc44335feb78"},
{file = "scipy-1.6.1-cp37-cp37m-win_amd64.whl", hash = "sha256:5fa9c6530b1661f1370bcd332a1e62ca7881785cc0f80c0d559b636567fab63c"},
{file = "scipy-1.6.1-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:bd50daf727f7c195e26f27467c85ce653d41df4358a25b32434a50d8870fc519"},
{file = "scipy-1.6.1-cp38-cp38-manylinux1_i686.whl", hash = "sha256:f46dd15335e8a320b0fb4685f58b7471702234cba8bb3442b69a3e1dc329c345"},
{file = "scipy-1.6.1-cp38-cp38-manylinux1_x86_64.whl", hash = "sha256:0e5b0ccf63155d90da576edd2768b66fb276446c371b73841e3503be1d63fb5d"},
{file = "scipy-1.6.1-cp38-cp38-manylinux2014_aarch64.whl", hash = "sha256:2481efbb3740977e3c831edfd0bd9867be26387cacf24eb5e366a6a374d3d00d"},
{file = "scipy-1.6.1-cp38-cp38-win32.whl", hash = "sha256:68cb4c424112cd4be886b4d979c5497fba190714085f46b8ae67a5e4416c32b4"},
{file = "scipy-1.6.1-cp38-cp38-win_amd64.whl", hash = "sha256:5f331eeed0297232d2e6eea51b54e8278ed8bb10b099f69c44e2558c090d06bf"},
{file = "scipy-1.6.1-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:0c8a51d33556bf70367452d4d601d1742c0e806cd0194785914daf19775f0e67"},
{file = "scipy-1.6.1-cp39-cp39-manylinux1_i686.whl", hash = "sha256:83bf7c16245c15bc58ee76c5418e46ea1811edcc2e2b03041b804e46084ab627"},
{file = "scipy-1.6.1-cp39-cp39-manylinux1_x86_64.whl", hash = "sha256:794e768cc5f779736593046c9714e0f3a5940bc6dcc1dba885ad64cbfb28e9f0"},
{file = "scipy-1.6.1-cp39-cp39-manylinux2014_aarch64.whl", hash = "sha256:5da5471aed911fe7e52b86bf9ea32fb55ae93e2f0fac66c32e58897cfb02fa07"},
{file = "scipy-1.6.1-cp39-cp39-win32.whl", hash = "sha256:8e403a337749ed40af60e537cc4d4c03febddcc56cd26e774c9b1b600a70d3e4"},
{file = "scipy-1.6.1-cp39-cp39-win_amd64.whl", hash = "sha256:a5193a098ae9f29af283dcf0041f762601faf2e595c0db1da929875b7570353f"},
{file = "scipy-1.6.1.tar.gz", hash = "sha256:c4fceb864890b6168e79b0e714c585dbe2fd4222768ee90bc1aa0f8218691b11"},
]
six = [
{file = "six-1.15.0-py2.py3-none-any.whl", hash = "sha256:8b74bedcbbbaca38ff6d7491d76f2b06b3592611af620f8426e82dddb04a5ced"},
{file = "six-1.15.0.tar.gz", hash = "sha256:30639c035cdb23534cd4aa2dd52c3bf48f06e5f4a941509c8bafd8ce11080259"},
]
soupsieve = [
{file = "soupsieve-2.2.1-py3-none-any.whl", hash = "sha256:c2c1c2d44f158cdbddab7824a9af8c4f83c76b1e23e049479aa432feb6c4c23b"},
{file = "soupsieve-2.2.1.tar.gz", hash = "sha256:052774848f448cf19c7e959adf5566904d525f33a3f8b6ba6f6f8f26ec7de0cc"},
]
termcolor = [
{file = "termcolor-1.1.0.tar.gz", hash = "sha256:1d6d69ce66211143803fbc56652b41d73b4a400a2891d7bf7a1cdf4c02de613b"},
]
threadpoolctl = [
{file = "threadpoolctl-2.2.0-py3-none-any.whl", hash = "sha256:e5a995e3ffae202758fa8a90082e35783b9370699627ae2733cd1c3a73553616"},
{file = "threadpoolctl-2.2.0.tar.gz", hash = "sha256:86d4b6801456d780e94681d155779058759eaef3c3564758b17b6c99db5f81cb"},
]
threadsafe = [
{file = "threadsafe-1.0.0-py3-none-any.whl", hash = "sha256:acbd59278ca8221dc3a8051443fe24c647ee9ac81808058e280ef6f75dd4387b"},
{file = "threadsafe-1.0.0.tar.gz", hash = "sha256:7c61f9fdd0b3cd6c07b427de355dafcd337578d30871634cb1e8985ee4955edc"},
]
toml = [
{file = "toml-0.10.2-py2.py3-none-any.whl", hash = "sha256:806143ae5bfb6a3c6e736a764057db0e6a0e05e338b5630894a5f779cabb4f9b"},
{file = "toml-0.10.2.tar.gz", hash = "sha256:b3bda1d108d5dd99f4a20d24d9c348e91c4db7ab1b749200bded2f839ccbe68f"},
]
typing-extensions = [
{file = "typing_extensions-3.7.4.3-py2-none-any.whl", hash = "sha256:dafc7639cde7f1b6e1acc0f457842a83e722ccca8eef5270af2d74792619a89f"},
{file = "typing_extensions-3.7.4.3-py3-none-any.whl", hash = "sha256:7cb407020f00f7bfc3cb3e7881628838e69d8f3fcab2f64742a5e76b2f841918"},
{file = "typing_extensions-3.7.4.3.tar.gz", hash = "sha256:99d4073b617d30288f569d3f13d2bd7548c3a7e4c8de87db09a9d29bb3a4a60c"},
]
urllib3 = [
{file = "urllib3-1.26.4-py2.py3-none-any.whl", hash = "sha256:2f4da4594db7e1e110a944bb1b551fdf4e6c136ad42e4234131391e21eb5b0df"},
{file = "urllib3-1.26.4.tar.gz", hash = "sha256:e7b021f7241115872f92f43c6508082facffbd1c048e3c6e2bb9c2a157e28937"},
]
validators = [
{file = "validators-0.12.6.tar.gz", hash = "sha256:f6aca085caf9e13d5a0fd8ddb3afbea2541c0ca9477b1fb8098c797dd812ff64"},
]
yapf = [
{file = "yapf-0.31.0-py2.py3-none-any.whl", hash = "sha256:e3a234ba8455fe201eaa649cdac872d590089a18b661e39bbac7020978dd9c2e"},
{file = "yapf-0.31.0.tar.gz", hash = "sha256:408fb9a2b254c302f49db83c59f9aa0b4b0fd0ec25be3a5c51181327922ff63d"},
]
yattag = [
{file = "yattag-1.14.0.tar.gz", hash = "sha256:5731a31cb7452c0c6930dd1a284e0170b39eee959851a2aceb8d6af4134a5fa8"},
]
zipp = [
{file = "zipp-3.4.1-py3-none-any.whl", hash = "sha256:51cb66cc54621609dd593d1787f286ee42a5c0adbb4b29abea5a63edc3e03098"},
{file = "zipp-3.4.1.tar.gz", hash = "sha256:3607921face881ba3e026887d8150cca609d517579abe052ac81fc5aeffdbd76"},
]

View File

@ -1,32 +0,0 @@
[tool.poetry]
name = "TorBot"
version = "1.3.0"
description = "Dark Web OSINT Tool."
authors = ["DedSecInside"]
license = "GNU"
[tool.poetry.dependencies]
python = "^3.7"
beautifulsoup4 = "^4.9.1"
PySocks = "^1.7.1"
termcolor = "^1.1.0"
requests = "^2.20.0"
requests_mock = "^1.4.0"
yattag = "^1.10.0"
pyinstaller = "^3.6.0"
ete3 = "^3.1.1"
PyQt5 = "^5.11.3"
validators = "^0.12.6"
python-dotenv = "^0.10.2"
threadsafe = "^1.0.0"
progress = "^1.5.0"
numpy = "^1.20.2"
scikit-learn = "^0.24.2"
yapf = "^0.31.0"
[tool.poetry.dev-dependencies]
pytest = "^6.2.3"
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

30
requirements.txt Normal file
View File

@ -0,0 +1,30 @@
altgraph==0.17.2
beautifulsoup4==4.11.1
certifi==2022.5.18.1
charset-normalizer==2.0.12
decorator==5.1.1
ete3==3.1.2
idna==3.3
igraph==0.9.11
joblib==1.1.0
macholib==1.16
numpy==1.22.4
progress==1.6
pyinstaller==5.1
pyinstaller-hooks-contrib==2022.7
PySocks==1.7.1
python-dotenv==0.20.0
requests==2.28.0
requests-mock==1.9.3
scikit-learn==1.1.2
scipy==1.9.1
six==1.16.0
sklearn==0.0
soupsieve==2.3.2.post1
termcolor==1.1.0
texttable==1.6.4
threadpoolctl==3.1.0
threadsafe==1.0.0
urllib3==1.26.9
validators==0.20.0
yattag==1.14.0

10
run.py Normal file
View File

@ -0,0 +1,10 @@
#!/usr/bin/env python3
from torbot import main
if __name__ == '__main__':
try:
args = main.get_args()
torbot = main.TorBot(args)
torbot.perform_action()
except KeyboardInterrupt:
print("Interrupt received! Exiting cleanly...")

View File

@ -1,44 +0,0 @@
import argparse
import requests
import numpy as np
from bs4 import BeautifulSoup
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.linear_model import SGDClassifier
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.datasets import load_files
# get html for site
parser = argparse.ArgumentParser(description='Classify Website')
parser.add_argument('-website', type=str, help='Website to categorize')
parser.add_argument('-accuracy', type=bool, help='Print accuracy')
args = parser.parse_args()
soup = BeautifulSoup(requests.get(args.website).text, features='html.parser')
html = soup.get_text()
# create classifier
clf = Pipeline([
('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', SGDClassifier())
])
dataset = load_files('training_data')
x_train, x_test, y_train, y_test = train_test_split(
dataset.data,
dataset.target
)
clf.fit(x_train, y_train)
website = 'Unknown'
if soup.title:
website = soup.title.text
# returns an array of target_name values
predicted = clf.predict([html])
print(f'The category of {website} is {dataset.target_names[predicted[0]]}')
if args.accuracy:
accuracy = np.mean(predicted == y_test)
print(f'Accuracy: {accuracy}%')

20
torbot/__init__.py Normal file
View File

@ -0,0 +1,20 @@
"""
Torbot Config.
"""
import os
from dotenv import load_dotenv
from .modules import link_io
# from .modules.linktree import LinkTree
from .modules.color import color
from .modules.updater import updateTor
from .modules.savefile import saveJson
from .modules.info import execute_all
from .modules.collect_data import collect_data
load_dotenv() # Loads environment variables from .env file
__version__ = '2.1.0'
HOST = os.getenv('HOST')
PORT = os.getenv('PORT')

10
torbot/__main__.py Normal file
View File

@ -0,0 +1,10 @@
#!/usr/bin/env python3
from torbot import main
if __name__ == '__main__':
try:
args = main.get_args()
torbot = main.TorBot(args)
torbot.perform_action()
except KeyboardInterrupt:
print("Interrupt received! Exiting cleanly...")

View File

@ -1,20 +1,21 @@
"""
MAIN MODULE
Core
"""
import argparse
import sys
from requests.exceptions import HTTPError
import config
from modules import link_io
from modules.linktree import LinkTree
from modules.color import color
from modules.updater import updateTor
from modules.savefile import saveJson
from modules.info import execute_all
from modules.collect_data import collect_data
from .modules import link_io
# from .modules.linktree import LinkTree
from .modules.color import color
from .modules.updater import updateTor
from .modules.savefile import saveJson
from .modules.info import execute_all
from .modules.collect_data import collect_data
from .modules.nlp import main
from . import config
# TorBot CLI class
class TorBot:
@ -67,22 +68,22 @@ class TorBot:
node_json = link_io.print_json(args.url, args.depth)
saveJson("Links", node_json)
def handle_tree_args(self, args):
"""
Outputs tree visual for data
"""
tree = LinkTree(args.url, args.depth)
# -v/--visualize
if args.visualize:
tree.show()
# def handle_tree_args(self, args):
# """
# Outputs tree visual for data
# """
# tree = LinkTree(args.url, args.depth)
# # -v/--visualize
# if args.visualize:
# tree.show()
# -d/--download
if args.download:
file_name = str(input("File Name (.pdf/.png/.svg): "))
tree.save(file_name)
# # -d/--download
# if args.download:
# file_name = str(input("File Name (.pdf/.png/.svg): "))
# tree.save(file_name)
def perform_action(self):
args = get_args()
args = self.args
if args.gather:
collect_data(args.url)
return
@ -97,15 +98,17 @@ class TorBot:
sys.exit()
if not args.quiet:
self.get_header()
# If url flag is set then check for accompanying flag set. Only one
# additional flag can be set with -u/--url flag
if not args.url:
print("usage: See torBot.py -h for possible arguments.")
print("usage: See torBot.py -h for possible arguments.")
link_io.print_tor_ip_address()
if args.classify:
result = main.classify(args.url)
print ("Website Classification: " + result[0], "| Accuracy: " + str(result[1]))
if args.visualize or args.download:
self.handle_tree_args(args)
# self.handle_tree_args(args)
raise NotImplementedError("Tree visualization and download is not available yet.")
elif args.save or args.mail or args.phone:
self.handle_json_args(args)
# -i/--info
@ -113,7 +116,7 @@ class TorBot:
execute_all(args.url)
else:
if args.url:
link_io.print_tree(args.url, args.depth)
link_io.print_tree(args.url, args.depth, args.classifyAll)
print("\n\n")
@ -141,6 +144,8 @@ def get_args():
default=[],
help=' '.join(("Specifiy additional website", "extensions to the list(.com , .org, .etc)"))
)
parser.add_argument("-c", "--classify", action="store_true", help="Classify the webpage using NLP module")
parser.add_argument("-cAll", "--classifyAll", action="store_true", help="Classify all the obtained webpages using NLP module")
parser.add_argument(
"-i", "--info", action="store_true", help=' '.join(("Info displays basic info of the scanned site"))
)

View File

@ -66,3 +66,17 @@ class GoTor:
url = f'http://{address}:{port}/phone?link={link}'
resp = requests.get(url)
return resp.json()
@staticmethod
def get_web_content(link, address='localhost', port='8081'):
"""
Returns the HTML content of the page.
Args:
link (str): the page to pull the content from.
address (str): network address
port (str): network port
"""
url = f'http://{address}:{port}/content?link={link}'
resp = requests.get(url)
return resp.text

View File

@ -2,15 +2,35 @@
Module that contains methods for collecting all relevant data from links,
and saving data to file.
"""
import requests
import re
from urllib.parse import urlsplit
from bs4 import BeautifulSoup
from termcolor import cprint
from re import search, findall
from requests.exceptions import HTTPError
import requests
import re
from .api import GoTor
keys = set() # high entropy strings, prolly secret keys
files = set() # pdf, css, png etc.
intel = set() # emails, website accounts, aws buckets etc.
robots = set() # entries of robots.txt
custom = set() # string extracted by custom regex pattern
failed = set() # urls that photon failed to crawl
scripts = set() # javascript files
external = set() # urls that don't belong to the target i.e. out-of-scope
fuzzable = set() # urls that have get params in them e.g. example.com/page.php?id=2
endpoints = set() # urls found from javascript files
processed = set() # urls that have been crawled
everything = []
bad_intel = set() # unclean intel urls
bad_scripts = set() # unclean javascript file urls
datasets = [files, intel, robots, custom, failed, scripts, external, fuzzable, endpoints, keys]
dataset_names = [
'files', 'intel', 'robots', 'custom', 'failed', 'scripts', 'external', 'fuzzable', 'endpoints', 'keys'
]
def execute_all(link, *, display_status=False):
"""Initialise datasets and functions to retrieve data, and execute
@ -21,28 +41,10 @@ def execute_all(link, *, display_status=False):
display_status (bool, optional): Whether to print connection
attempts to terminal.
"""
keys = set() # high entropy strings, prolly secret keys
files = set() # pdf, css, png etc.
intel = set() # emails, website accounts, aws buckets etc.
robots = set() # entries of robots.txt
custom = set() # string extracted by custom regex pattern
failed = set() # urls that photon failed to crawl
scripts = set() # javascript files
external = set() # urls that don't belong to the target i.e. out-of-scope
fuzzable = set() # urls that have get params in them e.g. example.com/page.php?id=2
endpoints = set() # urls found from javascript files
processed = set() # urls that have been crawled
everything = []
bad_intel = set() # unclean intel urls
bad_scripts = set() # unclean javascript file urls
datasets = [files, intel, robots, custom, failed, scripts, external, fuzzable, endpoints, keys]
dataset_names = [
'files', 'intel', 'robots', 'custom', 'failed', 'scripts', 'external', 'fuzzable', 'endpoints', 'keys'
]
response = requests.get(link)
soup = BeautifulSoup(response.text, 'html.parser')
validation_functions = [get_robots_txt, get_dot_git, get_dot_svn, get_dot_git, get_intel, get_bitcoin_address]
response = GoTor.get_web_content(link)
soup = BeautifulSoup(response, 'html.parser')
validation_functions = [get_robots_txt, get_dot_git, get_dot_svn, get_dot_git, get_intel, get_dot_htaccess, get_bitcoin_address]
for validate_func in validation_functions:
try:
validate_func(link, response)
@ -77,13 +79,13 @@ def get_robots_txt(target, response):
cprint("[*]Checking for Robots.txt", 'yellow')
url = target
target = "{0.scheme}://{0.netloc}/".format(urlsplit(url))
requests.get(target + "robots.txt")
GoTor.get_web_content(target + "robots.txt")
print(target + "robots.txt")
matches = findall(r'Allow: (.*)|Disallow: (.*)', response.text)
matches = re.findall(r'Allow: (.*)|Disallow: (.*)', response)
for match in matches:
match = ''.join(match)
if '*' not in match:
url = main_url + match
url = target + match
robots.add(url)
cprint("Robots.txt found", 'blue')
print(robots)
@ -99,7 +101,7 @@ def get_intel(link, response):
"""
intel = set()
regex = r'''([\w\.-]+s[\w\.-]+\.amazonaws\.com)|([\w\.-]+@[\w\.-]+\.[\.\w]+)'''
matches = findall(regex, response.text)
matches = re.findall(regex, response)
print("Intel\n--------\n\n")
for match in matches:
intel.add(match)
@ -115,9 +117,8 @@ def get_dot_git(target, response):
cprint("[*]Checking for .git folder", 'yellow')
url = target
target = "{0.scheme}://{0.netloc}/".format(urlsplit(url))
req = requests.get(target + "/.git/")
status = req.status_code
if status == 200:
resp = GoTor.get_web_content(target + "/.git/config")
if not resp.__contains__("404"):
cprint("Alert!", 'red')
cprint(".git folder exposed publicly", 'red')
else:
@ -131,7 +132,7 @@ def get_bitcoin_address(target, response):
target (str): URL to be checked.
response (object): Response object containing data to check.
"""
bitcoins = re.findall(r'^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$', response.text)
bitcoins = re.findall(r'^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$', response)
print("BTC FOUND: ", len(bitcoins))
for bitcoin in bitcoins:
print("BTC: ", bitcoin)
@ -147,9 +148,8 @@ def get_dot_svn(target, response):
cprint("[*]Checking for .svn folder", 'yellow')
url = target
target = "{0.scheme}://{0.netloc}/".format(urlsplit(url))
req = requests.get(target + "/.svn/entries")
status = req.status_code
if status == 200:
resp = GoTor.get_web_content(target + "/.svn/entries")
if not resp.__contains__("404"):
cprint("Alert!", 'red')
cprint(".SVN folder exposed publicly", 'red')
else:
@ -166,16 +166,15 @@ def get_dot_htaccess(target, response):
cprint("[*]Checking for .htaccess", 'yellow')
url = target
target = "{0.scheme}://{0.netloc}/".format(urlsplit(url))
req = requests.get(target + "/.htaccess")
statcode = req.status_code
if statcode == 403:
resp = GoTor.get_web_content(target + "/.htaccess")
if resp.__contains__("403"):
cprint("403 Forbidden", 'blue')
elif statcode == 200:
elif not resp.__contains__("404") or resp.__contains__("500"):
cprint("Alert!!", 'blue')
cprint(".htaccess file found!", 'blue')
else:
cprint("Status code", 'blue')
cprint(statcode)
cprint("Response", 'blue')
cprint(resp, 'blue')
def display_webpage_description(soup):
@ -201,12 +200,12 @@ def writer(datasets, dataset_names, output_dir):
for dataset, dataset_name in zip(datasets, dataset_names):
if dataset:
filepath = output_dir + '/' + dataset_name + '.txt'
if python3:
with open(filepath, 'w+', encoding='utf8') as f:
f.write(str('\n'.join(dataset)))
f.write('\n')
else:
with open(filepath, 'w+') as f:
joined = '\n'.join(dataset)
f.write(str(joined.encode('utf-8')))
f.write('\n')
with open(filepath, 'w+', encoding='utf8') as f:
f.write(str('\n'.join(dataset)))
f.write('\n')
# else:
# with open(filepath, 'w+') as f:
# joined = '\n'.join(dataset)
# f.write(str(joined.encode('utf-8')))
# f.write('\n')

View File

@ -4,10 +4,12 @@ objects or url strings
"""
import requests
from bs4 import BeautifulSoup
from pprint import pprint
from .api import GoTor
from .color import color
from pprint import pprint
from .nlp.main import classify
def print_tor_ip_address():
@ -20,7 +22,7 @@ def print_tor_ip_address():
print(f'Tor IP Address: {ip_string}')
def print_node(node):
def print_node(node, classify_page):
"""
Prints the status of a link based on it's connection status
Args:
@ -29,6 +31,9 @@ def print_node(node):
try:
title = node['url']
status_text = f"{node['status_code']} {node['status']}"
if classify_page:
classification = classify(GoTor.get_web_content(node['url']))
status_text += f" {classification}"
if node['status_code'] >= 200 and node['status_code'] < 300:
status = color(status_text, 'green')
elif node['status_code'] >= 300 and node['status_code'] < 400:
@ -43,14 +48,14 @@ def print_node(node):
print(status_msg)
def cascade(node, work):
work(node)
def cascade(node, work, classify_page):
work(node, classify_page)
if node['children']:
for child in node['children']:
cascade(child, work)
cascade(child, work, classify_page)
def print_tree(url, depth=1):
def print_tree(url, depth=1, classify_page=False):
"""
Prints the entire tree in a user friendly fashion
Args:
@ -58,7 +63,7 @@ def print_tree(url, depth=1):
depth (int): the depth to build the tree
"""
root = GoTor.get_node(url, depth)
cascade(root, print_node)
cascade(root, print_node, classify_page)
def print_json(url, depth=1):

View File

@ -8,5 +8,4 @@ To test gathering data use:
* This will generate the data necessary to train the classification model
To predict the classification of a webiste use:
`python3 main.py -website https://www.github.com`
* Add `-accuracy` argument, to view the accuracy of the prediction
`classify.py` and provide the url

View File

@ -1,6 +1,9 @@
import csv
import os
from pathlib import Path
os.chdir(Path(__file__).parent)
def write_data():
"""
@ -23,10 +26,10 @@ def write_data():
[id, website, content, category] = row
if category != 'category':
category = category.replace('/', '+')
dir_name = f"training_data/{category}"
Path(dir_name).mkdir(parents=True, exist_ok=True)
with open(f'{dir_name}/{id}.txt', mode='w+') as txtfile:
txtfile.write(content)
dir_name = f"training_data/{category}"
Path(dir_name).mkdir(parents=True, exist_ok=True)
with open(f'{dir_name}/{id}.txt', mode='w+') as txtfile:
txtfile.write(content)
if __name__ == "__main__":

View File

@ -0,0 +1,53 @@
import requests
import numpy as np
import os
from pathlib import Path
from bs4 import BeautifulSoup
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.linear_model import SGDClassifier
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.datasets import load_files
def classify(data):
"""
Classify URL specified by user
"""
soup = BeautifulSoup(data, features='html.parser')
html = soup.get_text()
# create classifier
clf = Pipeline([
('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', SGDClassifier())
])
try:
os.chdir(Path(__file__).parent)
dataset = load_files('training_data')
except FileNotFoundError:
print("Training data not found. Obtaining training data...")
print("This may take a while...")
from .gather_data import write_data
write_data()
print("Training data obtained.")
dataset = load_files('training_data')
pass
x_train, x_test, y_train, y_test = train_test_split(
dataset.data,
dataset.target
)
clf.fit(x_train, y_train)
website = 'Unknown'
if soup.title:
website = soup.title.text
# returns an array of target_name values
predicted = clf.predict([html])
accuracy = np.mean(predicted == y_test)
return [dataset.target_names[predicted[0]] , accuracy]

View File

Can't render this file because it is too large.

View File

@ -38,7 +38,7 @@ def join_local_path(file_name=""):
if file_name == "":
return
dev_file = find_file("torbot_dev.env", "../")
dev_file = find_file("dev.env", "../")
if not dev_file:
raise FileNotFoundError
load_dotenv(dotenv_path=dev_file)