Welcome to FolderAnalyse’s documentation!

Installation

FolderAnalyse requires Python 3.6 or above, and has been tested on Linux and MacOS.

First, to install Python, visit Python.org, or try the Anaconda.org distribution.

Once Python is installed, simply install the package using the Python package manager pip, by running on the command line:

pip install FolderAnalyse

Alternatively, you can install the development version from GitHub via:

pip install git+https://github.com/rpep/FolderAnalyse

Command Line Use

Here we give some information about the module FolderAnalyse.

To generate statistics about a particular file:

echo "The quick brown fox jumped over the lazy dog" > test.txt
FolderAnalyse test.txt

You should see some output like:

File "test.txt" Top 10 Word Frequencies
---------------------------------------
1. The, 1
2. quick, 1
3. brown, 1
4. fox, 1
5. jumped, 1
6. over, 1
7. the, 1
8. lazy, 1
9. dog, 1

To generate statistics about all files in a folder:

FolderAnalyse /path/to/a/folder

To generate statistics about all “*.md” files in a folder:

FolderAnalyse /path/to/a/folder -t ".md"

To save the outputted text as a report:

FolderAnalyse /path/to/a/folder -s report.txt

The tests for the project can be run from the command line with:

FolderAnalyse . -r

For the test cases I made use of out-of-copyright Project Gutenberg books as useful reference cases. These are included in the tests/example_docs folder.

API

In general FolderAnalyse is designed to be used from the command line, but here I’ll show how you can use the functions in your own projects.

The bulk of the interesting code is in FolderAnalyse.process, in the two functions process_file() and process_dir().

To process a file and get the frequency dictionary, simply:

>>> import FolderAnalyse.process as p

>>> f1 = open('test1.txt', 'w')
>>> f1.write("The quick brown fox jumped over the lazy dog")
>>> f1.close()

>>> stats_text, frequency_dict, top_freqs = p.process_file('test.txt',
                                                           N=5,
                                                           case_sensitive=False)
>>> print(top_freqs['the'])
2

If we create another file, we can use directory processing:

>>> f2 = open('test2.txt', 'w')
>>> f2.write("Writing words to the second file")
>>> f2.close()

# See the API documentation for more details:
>>> text, dics, top_dic, cdic, top_cdic = sp.process_dir('.')

>>> print(top_cdic['the'])
3

If the word counts are all that is required, this can be handled just using the function parse().

>>> import FolderAnalyse.fileparser as fp
>>> print(fp.parse('test2.txt', sort=True))
{'writing': 1, 'words': 1, 'to': 1, 'the': 1, 'second': 1, 'file': 1}

The tests for the project can be run directly from the Python interpreter with:

>>> import FolderAnalyse
>>> FolderAnalyse.runtests()

Indices and tables