FolderAnalyse package

Submodules

FolderAnalyse.fileparser module

Ryan Pepper (2018)

fileparser.py

This file contains a parser function that processes a text file.

FolderAnalyse.fileparser.combine_dicts(dicts_list)[source]

Combines dictionaries by summing the numerical values of any keys which are shared between them.

Inputs:
dicts_list, list
A list containing dictionaries with integer values.
Outputs:
dict:
A combined dictionary with summed values.

Example:

>>> a = {'a': 2, 'b': 1, 'c': 5}
>>> b = {'b': 4, 'c': 12, 'e': 4}
>>> FolderAnalyse.parser.combine_dicts([a, b])
{'a': 2, 'b': 5, 'c': 17, 'e': 4}
FolderAnalyse.fileparser.parse(filename, case_sensitive=False, sort=False)[source]

Opens a file, and reads it line by line, returning a dictionary containing key-value pairs of words and their frequency in the file. Note: newline characters are always removed from the file.

Inputs:
filename, str:
The file to be calculate word frequencies.
case_sensitive, bool:
Whether processing should be case sensitive or not, i.e. if ‘the’ is the same as ‘The’ for counting word frequencies.
sort, bool:
Setting True enables sorting dict by word frequency.
Outputs:
dict:
Dictionary containing key-value pairs of words and their frequency in the file.

Examples:

>>> text = "The quick brown fox jumped over the lazy dog."
>>> f = open('example.txt', 'w')
>>> f.write(text)
>>> f.close()
>>> FolderAnalyse.fileparser.parse('example.txt', case_sensitive=False)
{'the': 2, 'quick': 1, 'brown': 1, 'fox': 1, 'jumped': 1,
 'over: 1, 'the': 1, 'lazy': 1, 'dog.': 1}
>>> FolderAnalyse.fileparser.parse('example.txt', case_sensitive=False)
{'the': 2, 'quick': 1, 'brown': 1, 'fox': 1, 'jumped': 1,
 'over: 1, 'the': 1, 'lazy': 1, 'dog.': 1}
FolderAnalyse.fileparser.sort_dict(dictionary)[source]

Sorts a dictionary based on the numerical values stored in the dict.

Inputs:
dictionary, dict:
Dictionary with integer values.
Outputs:
dictionary, dict:
Dictionary sorted by value.

Examples:

>>> a = {'a': 2, 'b': 1, 'c': 5}
>>> FolderAnalyse.fileparser.sort_dict(a, reverse=True)
{'c': 5, 'a': 2, 'b': 1}
>>> FolderAnalyse.fileparser.sort_dict(a, reverse=False)
{'b': 1, 'a': 2, 'c': 5}

Notes:

Dictionarys are ordered in Python 3.6 and above but that is not the case for old versions where you must use collections.OrderedDict.

FolderAnalyse.process module

Ryan Pepper (2018)

process.py

Module containing functions that process word frequnecy dicts into a report format.

FolderAnalyse.process.process_dir(dirname, extension, N=10, case_sensitive=False)[source]

Processes all files in the given directory, and calls process_file on each of them. It then returns a report along with the data used to construct this.

Inputs:
dirname, str:
Directory to be processed
extension, str:
File extension to process in the directory.
N, int:
How many top frequencies should be calculated.
case_sensitive, bool:
Whether processing should be case sensitive or not, i.e. if ‘the’ is the same as ‘The’ for counting word frequencies.
Outputs:
str:
Text report detailing the word frequencies for displaying.
list of dicts:
The full word frequency dicts for each file.
list of dicts:
The reduced top frequency dicts with N entries.
dict:
The combined frequency dict across all files.
dict:
The top N word frequencies across all files.

Example:

>>> f1 = open('test1.txt', 'w')
>>> f1.write("The quick brown fox jumped over the lazy dog.")
>>> f1.close()
>>> f2 = open('test2.txt', 'w')
>>> f2.write("This is a second file, the most common word will "
             "still be the word the")
>>> f2.close()
>>> text, freq_dicts, combined = FolderAnalyse.process.process_dir(".")
>>> print(combined['the'])
5
FolderAnalyse.process.process_file(filename, N=10, case_sensitive=False)[source]

Process a file and return some text giving the top N words in the file and the original frequency dictionary.

Inputs:
filename, str:
File to be processed.
N, int:
Number of top frequencies to add to report
case_sensitive, bool:
Whether processing should be case sensitive or not, i.e. if ‘the’ is the same as ‘The’ for counting word frequencies.
Outputs:
str:
Textual report about word frequency in files.
dict:
Total word frequency dict
dict:
Reduced wrod frequency dict with N terms.

Example: >>> f = open(‘test.txt’, ‘w’) >>> f.write(“The quick brown fox jumped over the lazy dog.”) >>> f.close() >>> text, freq_dict = FolderAnalyse.process.process_file(“test.txt”) >>> print(freq_dict[‘the’]) 2

FolderAnalyse.process.top_frequencies(freq_dict, name, nterms)[source]

Returns the first nterms in the dictionary.

Input:
freq_dict, dict:
Dictionary of word frequencies.
nterms, int;
Number of word frequencies in returned dictionary.
Output:
dict:
The reduced size dictionary.

Note: This wrapper is needed just to handle files with less than 10 words.

FolderAnalyse.process.underline(title)[source]

Returns title but with another line matching the length as in restructured text format.

Inputs:
title, str:
Title to be underlined.
Outputs:
str:
Multiline string with underlining.

Example:

>>> print(FolderAnalyse.process.underline('Hello'))
Hello
-----

FolderAnalyse.script module

Ryan Pepper (2018)

script.py

This script contains the main entrypoint to the folder-analyse application.

FolderAnalyse.script.main()[source]

Main function which is the entrypoint to the application.

Gets parsed arguments, and constructs a report which is printed to the screen based on them. To see how the arguments affect the output, look at FolderAnalyse.script.get_parser()

Module contents

Ryan Pepper (2018)

__init__.py

Module base

FolderAnalyse.runtests()[source]

Run the test suite for FolderAnalyse

Notes

Adapted from: https://docs.pytest.org/en/latest/usage.html#calling-pytest-from-python-code