Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
In this vignette I introduce you to the basic functions of the ecdata
package. You can download the latest stable releases of the packages through CRAN and PyPi
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
install.packages('ecdata')
library(ecdata)
library(dplyr)
load_ecd
The primary function that is shared across the Python and R distributions of the package is the load_ecd
function. This function accepts four primary arguments:
Argument | R Specific Quirks | Python Specific Quirks |
---|---|---|
country | A String/A String Vector | String, Dictionary, or List |
language | A String/A String Vector | String, Dictionary, or List |
full_ecd | A boolean if set to TRUE downloads full dataset. Defaults to FALSE | A boolean if set to True downloads full dataset. Defaults to False |
ecd_version | A character string of the ECD version you want to download. Defaults to latest version | A character string of the ECD version you want to download. Defaults to latest version |
Functionally the ecd_version
argument is not entirely useful since there has only been one release of the data.
Say we only wanted data for South Korea1 we can simply set the country argument like this:
rok = load_ecd(country = 'Republic of Korea')
✔ Successfully downloaded Republic of Korea.
head(rok, 2)
# A tibble: 2 × 17
country url text date title executive type language file
<chr> <chr> <chr> <dttm> <chr> <chr> <chr> <chr> <chr>
1 Republic… http… 위대… 2022-03-10 00:00:00 정직… Yoon Suk… Spee… Korean <NA>
2 Republic… http… 위대… 2022-03-10 00:00:00 정직… Yoon Suk… Spee… Korean <NA>
# ℹ 8 more variables: isonumber <dbl>, gwc <chr>, cowcodes <chr>,
# polity_v <chr>, polity_iv <chr>, vdem <dbl>, year_of_statement <dbl>,
# office <chr>
country | url | text | date | title | executive | type | language | file | isonumber | gwc | cowcodes | polity_v | polity_iv | vdem | year_of_statement | office |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
str | str | str | datetime[μs, UTC] | str | str | str | str | str | f64 | str | str | str | str | f64 | f64 | str |
"Republic of Korea" | "https://www.president.go.kr/pr… | "위대하고 자랑스러운 국민 여러분! 고맙습니다. 다시 … | 2022-03-10 00:00:00 UTC | "정직한 정부, 정직한 대통령 되겠습니다." | "Yoon Suk Yeol" | "Speech" | "Korean" | null | 410.0 | "ROK" | "ROK" | "ROK" | "ROK" | 42.0 | 2022.0 | null |
"Republic of Korea" | "https://www.president.go.kr/pr… | "위대하고 자랑스러운 국민 여러분! 고맙습니다. 다시 … | 2022-03-10 00:00:00 UTC | "정직한 정부, 정직한 대통령 되겠습니다." | "Yoon Suk Yeol" | "Speech" | "Korean" | null | 410.0 | "ROK" | "ROK" | "ROK" | "ROK" | 42.0 | 2022.0 | null |
We implement caching by default so you will get a pretty shouty warning every few hours in R. load_ecd
has some tolerance for common names, abbreviations, and mixed punctuations of countries so if we wanted to download the same data using RK
, ROK
, or South Korea
these will all download the South Korean data.
sk = load_ecd(country = 'South Korea')
✔ Successfully downloaded Republic of Korea.
If you are not interested in single country case studies you can feed multiple countries to the country argument. In R we use a string vector. For Python you can use a list!
The same functionality is extended to the language argument too!
Both versions of the package allow you to use lazy loading to defer computation till you are done querying the dataset. To do this all you need to is call lazy_load_ecd
turkey_korea_lazy = lazy_load_ecd(country = c('South Korea', 'Turkey'))
✔ Note: Data for: South Korea and Turkey was successfully downloaded. To bring data into memory call dplyr::collect()
# A tibble: 2 × 17
country url text date title executive type language file
<chr> <chr> <chr> <dttm> <chr> <chr> <chr> <chr> <chr>
1 Turkey https:… Bugü… 2023-04-08 00:00:00 Başa… Recep Ta… Spee… Turkish <NA>
2 Turkey https:… Noks… 2023-04-08 00:00:00 Başa… Recep Ta… Spee… Turkish <NA>
# ℹ 8 more variables: isonumber <dbl>, gwc <chr>, cowcodes <chr>,
# polity_v <chr>, polity_iv <chr>, vdem <dbl>, year_of_statement <dbl>,
# office <chr>
turkey_rok_lazy = ec.lazy_load_ecd(['South Korea','Turkey'])
turkey_rok_lazy.filter(pl.col('country') == 'Turkey').collect().head(2)
country | url | text | date | title | executive | type | language | file | isonumber | gwc | cowcodes | polity_v | polity_iv | vdem | year_of_statement | office |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
str | str | str | datetime[μs, UTC] | str | str | str | str | str | f64 | str | str | str | str | f64 | f64 | str |
"Turkey" | "https://www.tccb.gov.tr/konusm… | "Türkiye Cumhuriyeti’nin 11. Cu… | 2014-08-28 00:00:00 UTC | "Devir Teslim Töreni’nde Yaptık… | "Recep Tayyip Erdogan" | "Speech" | "Turkish" | null | 792.0 | "TUR" | "TUR" | "TUR" | "TUR" | 99.0 | 2014.0 | null |
"Turkey" | "https://www.tccb.gov.tr/konusm… | "Çok Değerli Abdullah Gül Karde… | 2014-08-28 00:00:00 UTC | "Devir Teslim Töreni’nde Yaptık… | "Recep Tayyip Erdogan" | "Speech" | "Turkish" | null | 792.0 | "TUR" | "TUR" | "TUR" | "TUR" | 99.0 | 2014.0 | null |
I choose South Korea because the underlying file is relatively small compared to some of the other country files.↩︎