# 1) Tutorial -  Basic integration of raw data

## Install additional packages
```bash
pip install panda_datareader
```
##   Downloading raw data using data reader
As first step we download an example dataset from the World Bank.

```
from pandas_datareader import wb
matches = wb.search('gdp.*capita.*const')

data_df = wb.download(indicator='NY.GDP.PCAP.KD', country=['US', 'CA', 'MX', 'DE'], start=2005, end=2024)


matches2 = wb.search('emission')
emission_data = wb.download(indicator='CC.CO2.EMSE.EN', country=['US', 'CA', 'MX', 'DE'], start=2005, end=2024)
population_data = wb.download(indicator='SP.POP.TOTL', country=['US', 'CA', 'MX', 'DE'], start=2005, end=2024)

population_data = wb.download(indicator=['SP.POP.TOTL','CC.CO2.EMSE.EN'], country=['US', 'CA', 'MX', 'DE'], start=2005, end=2024)


```

## Data transformation

In order to integrate the raw data into datatoolbox, the raw structure must be adapted to the following requirements:
- ISO3  code for country as index
- years or  timestampe as  columns
- additional meta data describing the variable, scenario and source of the data

dt.Datatable takes as arguments a pandas dataframe and meta the meta data

```
dt.Datatable(data=pandas_df, meta  = meta_dict)
````


Exercise: convert the  data to required format
```


--- scroll below for solution  --- 


```


### Solution 

```
import datatoolbox as dt
from pandas_datareader import wb
matches = wb.search('gdp.*capita.*const')

data_df = wb.download(indicator='NY.GDP.PCAP.KD', country=['US', 'CA', 'MX', 'DE'], start=2005, end=2024)


#assuming the correct meta data
meta = {'variable' : 'GDP_per_capita',
        'scenario' : 'Historic',
        'source' : 'WORLDBANK_2024',
        'unit' : 'USD/count'}
wide_df = data_df.reset_index().pivot(index='country', columns='year', values='NY.GDP.PCAP.KD')
  
        
gdp_table = dt.Datatable(data=wide_df,
                     meta=meta)
gdp_table.index = dt.idx.convert_idx_string_to_iso(gdp_table.index)

final_table = gdp_table                     

```    


## Commit data as datatoolbox source


Excercise - Create and commit data as new sources
1) create the meta data dictionary for the new source
2) use the function add_table or dt.add_tables to database
3) verfiy that table is in database

```
# switch to testing database
dt.admin.switch_database_to_testing(force_new_DB=True)

# meta data for new source
source_dict =  {
    'SOURCE_ID': '',
    'collected_by': '',
    'date': dt.core.get_date_string(),
    'source_url': '',
    'licence': '',
}

# add your code here


```

### Solution 


```
dt.admin.switch_database_to_testing(force_new_DB=True)

# meta data for new source
source_dict =  {
    'SOURCE_ID': 'WORLDBANK_2024',
    'collected_by': 'MM',
    'date': dt.core.get_date_string(),
    'source_url': 'https://datatopics.worldbank.org/world-development-indicators/',
    'licence': 'open_source',
}
dt.commitTable(final_table, message = 'adding first table', sourceMetaDict = source_dict)


#verify that table is in inventory
dt.findp(source='WORLDBANK*')
```

### What happend  in the back ground
From the meta data, datatoolbox create  a table ID that conains  the variable,  scenario and source.  
This  ID should be unique within your database and is used to directly access the data, e.g. via 
dt.getTable
```
print(final_table.meta['ID'])

table_reloaded =  dt.getTable('GDP_per_capita__Historic__WORLDBANK_2024')
```