Browse Source

updated dir

Aziz Ketari 4 years ago
parent
commit
19c4ad5bfb

BIN
.DS_Store


+ 9 - 7
README.md

@@ -49,39 +49,41 @@ will automatically download a model for you and install it.
 `cd ~/covid19_ISMIR`
 `cd ~/covid19_ISMIR`
 
 
 - **Step 1:** Modify the values to each variables in env_variables.sh file then run
 - **Step 1:** Modify the values to each variables in env_variables.sh file then run
-
+> Assumption: You have already created/downloaded the json key to your Google Cloud Service Account. Useful [link](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#iam-service-account-keys-create-python)
 ```
 ```
 ./env_variables.sh
 ./env_variables.sh
 ```
 ```
 
 
 - **Step 2:** Download the required files to your bucket and load the required model in your local  
 - **Step 2:** Download the required files to your bucket and load the required model in your local  
 (this step will take ~10 min)
 (this step will take ~10 min)
-
+> Optional: If you have already downloaded the scispacy model, you should modify the file ./content/download_content.sh to not repeat that step
 ```
 ```
-sh ~/data/download_content.sh
+sh ~/content/download_content.sh
 pip install -U ./scispacy_models/en_core_sci_lg-0.2.4.tar.gz
 pip install -U ./scispacy_models/en_core_sci_lg-0.2.4.tar.gz
 ```
 ```
 
 
 - **Step 3:** Start the extraction of text from the pdf documents  
 - **Step 3:** Start the extraction of text from the pdf documents  
 
 
-`python3 extraction.py`
+`python3 ./scripts/extraction.py`
 
 
 ## Pre-processing data
 ## Pre-processing data
 Following the extraction of text, it's time to translate it from Italian to English and curate it.
 Following the extraction of text, it's time to translate it from Italian to English and curate it.
 
 
-`python3 preprocessing.py`
+`python3 ./scripts/preprocessing.py`
 
 
 ## Storing data
 ## Storing data
 Following the pre-processing, it's time to store the data in a more searchable format: a data warehouse - 
 Following the pre-processing, it's time to store the data in a more searchable format: a data warehouse - 
 [BigQuery](https://cloud.google.com/bigquery) - for the text, and a No-SQL database - 
 [BigQuery](https://cloud.google.com/bigquery) - for the text, and a No-SQL database - 
 [Datastore](https://cloud.google.com/datastore) - for the (UMLS) medical entities. 
 [Datastore](https://cloud.google.com/datastore) - for the (UMLS) medical entities. 
 
 
-`python3 storing.py`
+`python3 ./scripts/storing.py`
 
 
 ## Test
 ## Test
 Last but not least, you can query your databases using this script.
 Last but not least, you can query your databases using this script.
 
 
-`python3 retrieving.py`
+`python3 ./scripts/retrieving.py`
+
+---
 
 
 ## Contributing
 ## Contributing
 > To get started...
 > To get started...

BIN
data/.DS_Store → content/.DS_Store


+ 0 - 0
data/UMLS_tuis.csv → content/UMLS_tuis.csv


+ 0 - 0
data/download_content.sh → content/download_content.sh


+ 0 - 0
data/images/.DS_Store → content/images/.DS_Store


+ 0 - 0
data/images/bq_snapshot.gif → content/images/bq_snapshot.gif


+ 0 - 0
data/images/covid19_repo_architecture_3_24_2020.png → content/images/covid19_repo_architecture_3_24_2020.png


+ 0 - 0
data/images/datastore_snapshot.gif → content/images/datastore_snapshot.gif


+ 0 - 0
scripts/__init__.py


+ 1 - 1
extraction.py → scripts/extraction.py

@@ -1,6 +1,6 @@
 from google.cloud import storage, vision
 from google.cloud import storage, vision
 from google.oauth2 import service_account
 from google.oauth2 import service_account
-from utils.preprocessing_fcn import async_detect_document, read_json_result, upload_blob
+from covid19_ISMIR.utils.preprocessing_fcn import async_detect_document, read_json_result, upload_blob
 
 
 import logging
 import logging
 import time
 import time

+ 1 - 1
preprocessing.py → scripts/preprocessing.py

@@ -1,6 +1,6 @@
 from google.cloud import storage
 from google.cloud import storage
 from google.oauth2 import service_account
 from google.oauth2 import service_account
-from utils.preprocessing_fcn import batch_translate_text, upload_blob
+from covid19_ISMIR.utils.preprocessing_fcn import batch_translate_text, upload_blob
 import logging
 import logging
 
 
 import re
 import re

+ 2 - 2
retrieving.py → scripts/retrieving.py

@@ -1,7 +1,7 @@
 from google.cloud import storage, bigquery, datastore
 from google.cloud import storage, bigquery, datastore
 from google.oauth2 import service_account
 from google.oauth2 import service_account
-from utils.bq_fcn import returnQueryResults
-from utils.ner_fcn import getCases
+from covid19_ISMIR.utils.bq_fcn import returnQueryResults
+from covid19_ISMIR.utils.ner_fcn import getCases
 
 
 import logging
 import logging
 import os
 import os

+ 2 - 2
storing.py → scripts/storing.py

@@ -1,7 +1,7 @@
 from google.cloud import storage, bigquery, datastore
 from google.cloud import storage, bigquery, datastore
 from google.oauth2 import service_account
 from google.oauth2 import service_account
-from utils.bq_fcn import bqCreateDataset, bqCreateTable, exportItems2BQ
-from utils.ner_fcn import loadModel, addTask, extractMedEntities
+from covid19_ISMIR.utils.bq_fcn import bqCreateDataset, bqCreateTable, exportItems2BQ
+from covid19_ISMIR.utils.ner_fcn import loadModel, addTask, extractMedEntities
 import en_core_sci_lg
 import en_core_sci_lg
 
 
 import logging
 import logging