methods descriptions

odata client for Copernicus Data Space catalog

cdsodatacli.query.fetch_data(gdf, timedelta_slice=None, min_sea_percent=None, top=None, cache_dir=None, querymode='seq', email=None, password=None, display_tqdm=False)

Fetches meta-data of CDSE products based on provided parameters.

Splits the input GeoDataFrame by the id_query column and executes fetching for each unique query ID.

Parameters:

gdf (gpd.GeoDataFrame) –
Geospatial data for the query.
Mandatory columns:
- ’start_datetime’: Starting date for the query.
- ’end_datetime’: Ending date for the query.
- ’id_query’: Unique identifier of the query.
Optional columns:
- ’name’: SAFE name pattern.
- ’collection’: e.g., SENTINEL-1.
- ’sensormode’: e.g., IW.
- ’producttype’: e.g., GRD.
- ’geometry’: shapely geometry for area of interest.
- ’Attributes’: Extra OData filters.
timedelta_slice (datetime.timedelta, optional) – Time window size to split queries to avoid OData’s 1000 product limit.
min_sea_percent (float, optional) – Minimum sea percent to filter products.
top (int, optional) – Max rows per individual OData query.
cache_dir (str, optional) – Path to directory for storing/reusing results.
querymode (str) – ‘seq’ (sequential) or ‘multi’ (multithreaded). Defaults to ‘seq’.
email (str, optional) – CDSE account email for authentication.
password (str, optional) – CDSE account password.
display_tqdm (bool) – Whether to show a progress bar. Defaults to False.

Returns:

Concatenated meta-data results from all queries.

Return type:

pd.DataFrame

cdsodatacli.download.cds_s3_download_one_product(s3_path, s3_credentials, output_filepath, conf)

Download a single SAFE product via the CDSE S3 endpoint using boto3.

The conf dict must contain:

pre_spool (str): temp directory for .tmp files
s3_access_key (str): CDSE S3 access key
s3_secret_key (str): CDSE S3 secret key
s3_endpoint (str): e.g. “https://eodata.dataspace.copernicus.eu”
s3_bucket (str): e.g. “eodata”
s3_path (str): prefix inside the bucket, e.g.
“Sentinel-1/SAR/GRD/2022/05/03/S1A_IW_GRDH_1SDV_20220503T000000.SAFE”
s3_region (str): ‘default’ CDSE requires “default”

Argument:

s3_path (str): e.g. “Sentinel-1/SAR/GRD/2022/05/03/S1A_IW_GRDH_1SDV_20220503T000000.SAFE” s3_credentials (dict): with keys ‘s3-access-key’ and ‘s3-secret’ output_filepath (str): output full path when download is finished (not the pre-spool but the spool) conf (dict): configuration see details above

Returns:

speed (float) (download speed in Mo/second)
elapsed_time (int) (number of seconds to download the product)
total_mb (int) (MegaBytes downloaded (zip))
status_meaning (str) (human-readable outcome)
safename_base (str) (basename without .zip)

cdsodatacli.download.filter_product_already_present(cpt, df, outputdir, cdsodatacli_conf, force_download=False, extension='.zip')

Based on a dataframe of products to download, filter those already present locally.

Parameters:

(collections.defaultdict(int)) (cpt)
(pd.DataFrame) (df)
(str) (extension)
(dict) (cdsodatacli_conf)
(bool) (force_download)
(str)

Returns:

df_todownload (pd.DataFrame) (dataframe of products to download)
cpt (collections.defaultdict(int)) (updated counter)

cdsodatacli.download.add_missing_cdse_hash_ids_in_listing(listing_path, conf, display_tqdm=False, email=None, password=None)

Add columns of CDSE product ID and S3 path in a listing of products to download based on the safenames. This is useful for instance for the private data IOC products since the CDSE Odata search does not return the hash id for those products but only the safename. The method is using the same query method as the one used in the CDSE Odata search script (opensearch_private_data_IOC.py) to retrieve the hash id associated to each safename.

Parameters:

listing_path (str)
conf (dict) – configuration of the lib cdsodatacli (used to know which unit is PRIVATE)
display_tqdm (bool) – True -> tqdm progress bar for each queries [optional, default=False]
email (str) – email of the CDSE account to use for queries [optional, default None -> use cdsodatacli default behavior]
password (str) – password of the CDSE account to use for queries [optional, default None -> use cdsodatacli default behavior]

Returns:

dataframe with 3 columns “id”, “safename”, and “S3Path” containing the hash id provided by CDSE Odata and the safename of the product

Return type:

res (pd.DataFrame)

cdsodatacli.download.download_list_product_multithread_v4(inputdf, outputdir, account_group, hideprogressbar=False, check_on_disk=True, cdsodatacli_conf_file=None)

v4 is working as deamon like v3 (while loop) multi account round-robin: and token semaphore files but using S3 endpoint to download each product

In this method is working for a group of account with one or many account. Each account can run 4 parallel sessions. step 1: filter the dataframe containing the raw list of products to download -> remove duplicate and remove products already downloaded step 2: create multiple threads to download in parallel (depends on number of account and sessions per account) step 3: loop until all the products are treated step 3.1: get an account (i.e. S3 credentials) for which one session is free/available for download step 3.2: submit future downloads up to the current limit of available sessions step 3.3: wait for the first download thread/session to be finished step 3.4: clean lock on the session to free the session step 4: security lock cleaning (to avoid any orphan busy sessions at the end of the process) step 5: print out the download speed and elapsed times.

Parameters:

(pd.DataFrame) (inputdf)
(str) (cdsodatacli_conf_file)
(str)
(bool) (check_on_disk)
(bool)
(str)

Return type:

df2 (pd.DataFrame)

cdsodatacli.utils.get_conf(path_config_file=None) → dict

Load configuration from localconfig.yml or config.yml in cdsodatacli package directory.

Parameters:: path_config_file (str, optional) – Full path to the configuration YAML file. Defaults to None.
Returns:: Configuration parameters loaded from the YAML file.
Return type:: dict

cdsodatacli.utils.check_safe_in_outputdir(outputdir, safename)

Parameters:: basename (safename (str))
Returns:: present_in_outputdir (bool)
Return type:: True -> the product is already in the spool dir

cdsodatacli.utils.check_safe_in_spool(safename, conf)

Parameters:

basename (safename (str))
package (conf (dict) configuration dictionary of cdsodatacli)

Returns:

present_in_spool (bool)

Return type:

True -> the product is already in the spool dir

cdsodatacli.utils.WhichArchiveDir(safe, conf, archive_type)

Determine the archive directory path for a given safe based on its naming convention.

Parameters:

safe (str) – safe base name
conf (dict)
archive_type (str) – type of archive directory to use from conf (e.g., ‘datawork’, ‘scale’)

Returns:

full path of the archive directory where the safe should be stored

Return type:

gooddir (str)

cdsodatacli.utils.check_safe_in_archive(safename, conf)

Check if a given safe is already present in the different archive directories.

Parameters:

(str) (safename)
package (conf (dict) configuration dictionary of cdsodatacli)

Returns:

present_in_archive (bool)

Return type:

True -> the product is already in the archive dir. False -> not present.

cdsodatacli.utils.convert_json_opensearch_query_to_listing_safe_4_dowload(json_path) → str

Parameters:: str (json_path)
Returns:: output_txt str
Return type:: listing with 2 columns: id,safename

cdsodatacli.utils.convert_json_odata_query_to_listing_safe_4_download(json_path: str) → str

Convert an OData Products JSON response into a 2-column listing (id, safename)

Parameters:: json_path (str) – Full path of the OData JSON file
Returns:: output_txt – Text file with 2 columns: id,safename (no header)
Return type:: str

cdsodatacli.session.get_sessions_download_available_s3(conf, active_s3_sessions_status, subset_to_treat, blacklist, logins_group='logins')

This method should return the list of available sessions for a group of CDSE accounts contrarily to get_sessions_download_available() it use thread locked in memory variable to list active sessions

Parameters:

conf (dict)
active_s3_sessions_status (dict) – login:session_id(int):False->inactive True>-active (set to inactive at begining of a download)
subset_to_treat (pandas.DatFrame)
blacklist (list) – list of account not usable
logins_group (str) – name of the group of CDSE accounts to use (can contain multiple accounts, it depends on the localconfig.yml)

Returns:

with columns ‘s3_session’, ‘login’, ‘S3Path’, ‘output_path’, ‘safe’, ‘s3_access_key’, ‘s3_secret’ active_s3_sessions_status (dict): updated with the sessions that are now set to active

Return type:

df_products_ready_for_download (pandas.DataFrame)

this script is now deprecated since S3 backend download is using long term S3 credentials from config file

cdsodatacli.s3_temporary_access_token._get_fresh_s3_client(conf, headers)

Create a S3 resources (boto3 client) and S3 temporary credentials.

Parameters:

conf (dict) – Configuration dictionary containing S3 endpoint, region, and bucket information.
headers (dict) – Headers containing the Bearer token for authentication.

Returns:

A tuple containing the S3 temporary credentials and the boto3 S3 resource object.

Return type:

tuple