CSV-based Database¶
- class datatoolbox.database.Database[source]¶
CSV based database that uses git for as distributed version control system. Each table is saved locally as a csv file and identified by a unique ID. The csv files are organized in various sources in individual folders. Each sources comes with its own git repository and can be shared with others.
- add_to_inventory(datatable)[source]¶
Method to add a table to the global inventory file. Input: datatable
- clearLogTables()[source]¶
Clears the list of logged tables. This is anyway done if the package is newly loaded
- commitTable(dataTable, message, sourceMetaDict=None)[source]¶
Adds a table permamently to the underlying database. For the first table of a new source, the meta data for the sources needs to be provides as well
Input¶
table : Datatable message : str sourceMetaDict [Optional] : dict
- commitTables(dataTables, message, sourceMetaDict=None, append_data=False, update=False, overwrite=False, cleanTables=True)[source]¶
Adds multipe tables permamently to the underlying database. For the first table of a new source, the meta data for the sources needs to be provides as well
Input¶
tables : list of Datatable message : str sourceMetaDict [Optional] : dict append_data [optinal] : bool to choose if new data is added to the existing
table (new data does not overwrite old data)
update : [optional] : bool to choose if the exting data is updated overwrite : [optional] : bool to choose if data is overwriten (new data
overwrites old data)
- cleanTables [optional]bool (default: true) to choose if tables are
cleaned before commit
TODO: Check flags
- create_empty_datashelf(modulePath, pathToDataself)[source]¶
Method to create the required files for an empty csv-based data base. (Equivalent to the fucntions in admin.py)
- exportSourceToRemote(sourceID)[source]¶
This function exports a new local dataset to the defind remote database.
Input is a local sourceID as a str.
- findc(**kwargs)[source]¶
Method to search through the inventory. kwargs can be all inventory entires (see config.INVENTORY_FIELDS).
- findp(level=None, regex=False, **filters)[source]¶
Future defaulf find method that allows for more sophisticated syntax in the filtering
Usage:¶
- filtersUnion[str, Iterable[str]]
One or multiple patterns, which are OR’d together
- regexbool, optional
Accept plain regex syntax instead of shell-style, default: False
Returns¶
matches : pd.Series Mask for selecting matched rows
- getTable(ID, native_regions=False)[source]¶
Method to load the datatable for the given tableID.
Input¶
tableID : str native_regions : bool, optional
Load native region defintions if available. The default is False.
Returns table : Datatable
- getTables(iterIDs, native_regions=False, disable_progress=None)[source]¶
Method to return multiple datatables at once as a dictionary like set fo tables.
Input¶
- iterIDS: list [str]
List of IDs to load.
- native_regionsbool, optional
Load native region defintions if available. The default is False.
- disable_progressbool, optional
Disable displaying of progressbar. The default None hides the progressbar on non-tty outputs
Returns¶
tables : TableSet
- importSourceFromRemote(remoteName)[source]¶
This functions imports (git clone) a remote dataset and creates a local copy of it.
Input is an existing sourceID.
- isConsistentTable(datatable)[source]¶
Checks if that table is fitting the following requirements - numeric data - spatial identifiers are known to the database - columns are propper years - index is not duplicated
- pull_update_from_remote(repoName)[source]¶
Updates the local data repository by the newest version on the remote repository
Parameters¶
- repoNamestr
Source ID string to identify which source repository should be updated.
Returns¶
None.
- removeTables(IDList)[source]¶
Method to remnove tables from the database
Input¶
IDList : list of str
- remove_from_inventory(tableID)[source]¶
Method to remove a table from the global inventory Input: tableID
- save_logged_tables(folder='data')[source]¶
Creates a local data directory that can be used to run the logges analysis indepenedly.
Parameters¶
- folderstr, optional
DESCRIPTION. The default is ‘data’.
Returns¶
None.
- sourceExists(source)[source]¶
Function to check if a source is propperly registered in the data base
Input: SourceID
- startLogTables()[source]¶
Starts the logging of loaded datatables. This is useful to collect all required tables for a given analysis to create a datapackage for off-line useage
- stopLogTables()[source]¶
Stops the logging process of datatables and return the list of loaded table IDs for more processing.
- updateExcelInput(fileName)[source]¶
This function updates all data values that are defined in the input sheet in the given excel file
- updateTable(oldTableID, newDataTable, message)[source]¶
Specific method to update the data of an existing table
Input¶
oldTableID : str newDataTabble : Datatable message : str
Commit message to describle the added data
- updateTables(oldTableIDs, newDataTables, message)[source]¶
Equivalent method to updateTable, but for multiple tables at once
Input¶
oldTableIDs : list of str newDataTabbles : list of Datatable message : str
Commit message to describle the added data
- updateTablesAvailable(private_access_token)[source]¶
Method to update module data folder with the latest datashelf contents on Gitlab
Requirements:
private_access_token: generated on Gitlab