class documentation
Generate dataset for training.
| Static Method | final |
Filter the translated data based on checktype. |
| Static Method | remove |
Remove quotes in text. |
| Method | __init__ |
Initialize DatasetGenerator. |
| Method | filter |
Validate the data of hospitals. Keep the valid data and remove the invalid data. |
| Method | generate |
Generate file path for each image. |
| Method | generate |
Generate merged csv files for each hospital. |
| Method | make |
Convert the translated info to a aligned dataset for training the minigpt4 model. |
| Method | match |
Pair check data with image data. |
| Method | merge |
Merge all data with checktype into a single json file. |
| Method | merge |
Merge excel files into one dataframe. |
| Method | refine |
Refine the caption. |
| Method | reorganize |
Reorganize the data structure. |
| Instance Variable | EXCEL |
upper limit of excel rows |
| Instance Variable | MIN |
minimum length of caption |
| Instance Variable | NON |
non valid print id |
| Instance Variable | root |
dataset root path. |
| Instance Variable | save |
root path of processed dataset. |
| Method | _defint |
Define constant variables. |
| Method | _filter |
Check if the metadata is valid. |
| Method | _get |
Given a matched dataframe, get a list of images info for each patient. |
| Method | _get |
Get valid images for a patient. |
| Method | _refine |
Delete unnecessary keys and rename keys to English. |
Validate the data of hospitals. Keep the valid data and remove the invalid data.
Check if the check date in the metadata is the same as the date in the foler's name.
Convert the translated info to a aligned dataset for training the minigpt4 model.
See: https://github.com/Vision-CAIR/MiniGPT-4/blob/main/dataset/README_2_STAGE.md
Pair check data with image data.
The output is saved in self.save_dir / f"{hospital}-图像检查表.csv"
| Parameters | |
hospital:str | hospital name. |
Reorganize the data structure.
Save images using "{hospital}/{check_date}-{pid}/{sequence}.jpg" template.