Preparing a New Dataset¶
Helper for preparing a new dataset.
-
antinex_utils.prepare_dataset_tools.
find_all_headers
(use_log_id=None, pipeline_files=[], label_rules=None)[source]¶ Parameters: - use_log_id – label for debugging in logs
- pipeline_files – list of files to prep
- label_rules – dict of rules to apply
-
antinex_utils.prepare_dataset_tools.
build_csv
(pipeline_files=[], fulldata_file=None, clean_file=None, post_proc_rules=None, label_rules=None, use_log_id=None, meta_suffix='metadata.json')[source]¶ Parameters: - pipeline_files – list of files to process
- fulldata_file – output of non-edited merged data
- clean_file – cleaned csv file should be ready for training
- post_proc_rules – apply these rules to post processing (clean)
- label_rules – apply labeling rules (classification only)
- use_log_id – label for tracking the job in the logs
- meta_suffix – file suffix