| Title: | Downloading and Cleaning U.S. Macroeconomic Data |
|---|---|
| Description: | Utilities to retrieve and tidy U.S. macroeconomic data series from public government data providers. Functions streamline access to series from the Federal Reserve Bank of St. Louis Federal Reserve Economic Data (FRED), the Bureau of Labor Statistics flat files, and the Bureau of Economic Analysis National Income and Product Accounts tables, then return consistent, tidy data frames ready for modeling and graphics. The package includes helpers for date alignment, log-linear projections, and common macro diagnostics, along with convenience plot builders for quick publication-quality charts. |
| Authors: | Mike Konczal [aut, cre] |
| Maintainer: | Mike Konczal <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-29 08:50:57 UTC |
| Source: | https://github.com/mtkonczal/tidyusmacro |
A tibble with 250 rows and 2 columns representing industry codes and corresponding industry titles.
cesDiffusionIndexcesDiffusionIndex
A tibble with 250 rows and 2 variables:
A character vector containing the industry codes (e.g., "10-11330000").
A character vector containing the titles of the industries (e.g., "Logging").
This dataset contains information on different industries, where each row corresponds to an industry defined by its unique code and a descriptive title. It is useful for analyses that require linking industry classifications to descriptive labels.
U.S. Bureau of Labor Statistics (BLS)
# Load the dataset data(cesDiffusionIndex)# Load the dataset data(cesDiffusionIndex)
A tibble mapping each of the 177 components used in the Federal Reserve Bank of Dallas's trimmed-mean PCE inflation rate to its corresponding series in BEA NIPA Table 2.4.4U (Fisher price index for personal consumption expenditures by type of product, monthly).
dallasTrimPCEcomponentsdallasTrimPCEcomponents
A tibble with 177 rows and 5 variables:
Integer. Ordinal position in the Dallas Fed tech-notes list.
Character. Component name as published by the Dallas Fed.
Character. BEA NIPA series code in Table 2.4.4U.
Integer. Line number in BEA Table 2.4.4U.
Character. Label BEA publishes alongside series_code.
Components are organized as durables (1-41), nondurables (42-91),
services (92-177), and one NPISH aggregate (178). The 2009 tech notes
(Dolmas, "Trimmed Mean PCE Inflation," updated 2022-12-23) list 178
components; BEA combined two of them - Tenant-Occupied Stationary
Homes and Tenant Landlord Durables - into a single line of Table
2.4.4U (IA000629) when the disaggregated series stopped in December
2001. The mapping reflects that combination, yielding 177 rows;
dallas_idx runs 1..178 with 94 omitted (93 holds the merged item).
Dolmas, J. (2009, updated 2022-12-23). "PCE Inflation: Technical Note." Federal Reserve Bank of Dallas. BEA NIPA Table 2.4.4U.
Atkinson, T., Dolmas, J., & Zarutskie, R. (2026). "Skewness warrants caution as Trimmed Mean PCE inflation eases." Federal Reserve Bank of Dallas, April 16, 2026.
data(dallasTrimPCEcomponents) head(dallasTrimPCEcomponents)data(dallasTrimPCEcomponents) head(dallasTrimPCEcomponents)
Create a breaks function for scale_x_date() that
always includes the last actual data month and then selects every
nth month counting backward.
date_breaks_gg(n = 6, last, decreasing = FALSE)date_breaks_gg(n = 6, last, decreasing = FALSE)
n |
Integer; keep every n-th month counting backward from |
last |
Date; the last (max) date in your data. Required to ensure no break is placed after your actual data. |
decreasing |
Logical; if TRUE, return breaks in descending order. Default FALSE. |
A function usable in scale_x_date(breaks = ...).
# Minimal reproducible example (avoid using the name `df`, which masks stats::df) set.seed(1) dat <- data.frame( date = seq(as.Date("2023-01-01"), by = "month", length.out = 24), value = cumsum(rnorm(24)) ) library(ggplot2) ggplot(dat, aes(date, value)) + geom_line() + scale_x_date( date_labels = "%b\n%Y", breaks = date_breaks_gg(n = 6, last = max(dat$date)) ) + labs(x = NULL, y = NULL)# Minimal reproducible example (avoid using the name `df`, which masks stats::df) set.seed(1) dat <- data.frame( date = seq(as.Date("2023-01-01"), by = "month", length.out = 24), value = cumsum(rnorm(24)) ) library(ggplot2) ggplot(dat, aes(date, value)) + geom_line() + scale_x_date( date_labels = "%b\n%Y", breaks = date_breaks_gg(n = 6, last = max(dat$date)) ) + labs(x = NULL, y = NULL)
Generate a sequence of date breaks for ggplot scales,
taking every nth unique date.
date_breaks_n(dates, n = 6, decreasing = TRUE)date_breaks_n(dates, n = 6, decreasing = TRUE)
dates |
A vector of dates. |
n |
Integer, keep every n-th date (default = 6). |
decreasing |
Logical, if TRUE (default) sorts dates in descending order. |
A vector of dates suitable for use as ggplot2 axis breaks.
library(ggplot2) library(dplyr) df <- tibble( date = seq.Date(as.Date("2020-01-01"), as.Date("2025-01-01"), by = "month"), value = rnorm(61) ) ggplot(df, aes(date, value)) + geom_line() + scale_x_date(breaks = date_breaks_n(df$date, 6))library(ggplot2) library(dplyr) df <- tibble( date = seq.Date(as.Date("2020-01-01"), as.Date("2025-01-01"), by = "month"), value = rnorm(61) ) ggplot(df, aes(date, value)) + geom_line() + scale_x_date(breaks = date_breaks_n(df$date, 6))
Named vector of ESP-branded colors.
esp_palesp_pal
An object of class character of length 3.
Custom theme and color palette for Economic Security Project graphics.
theme_esp(base_family = "Public Sans") scale_color_esp(...) scale_fill_esp(...) scale_colour_esp(...)theme_esp(base_family = "Public Sans") scale_color_esp(...) scale_fill_esp(...) scale_colour_esp(...)
base_family |
Base font family for the theme. Defaults to "Public Sans". |
... |
Passed to the underlying ggplot2 scale functions. |
A ggplot2 theme or scale object.
Downloads and processes data from Bureau of Labor Statistics (BLS) flat files. Supports multiple data sources including CPI, ECI, JOLTS, CPS, CES, and others. The function retrieves the main data file along with associated metadata files, merges them, and returns a tidy tibble ready for analysis.
getBLSFiles(data_source, email)getBLSFiles(data_source, email)
data_source |
Character string specifying the BLS data source. Available options:
|
email |
Character string with your email address. Required by BLS for identifying API users. Set as the HTTP User-Agent header. |
The function constructs URLs to BLS flat files at https://download.bls.gov/pub/time.series/, downloads the series metadata and auxiliary lookup tables, then downloads and merges the main data file. Date parsing handles both monthly (most sources) and quarterly (ECI) data frequencies.
A tibble containing the merged data with columns for:
series_id |
Unique identifier for each data series |
date |
Observation date |
value |
Numeric data value |
... |
Additional metadata columns vary by data source (e.g., item codes, industry codes, area codes) |
# Download CPI data cpi_data <- getBLSFiles("cpi", "[email protected]") # Download JOLTS data jolts_data <- getBLSFiles("jolts", "[email protected]")# Download CPI data cpi_data <- getBLSFiles("cpi", "[email protected]") # Download JOLTS data jolts_data <- getBLSFiles("jolts", "[email protected]")
Returns a long tibble with the raw inputs to the Federal Reserve Bank of Dallas's trimmed-mean PCE inflation rate: monthly Fisher price index, nominal expenditure, real quantity, monthly price change, the Fisher (t-1, t) expenditure-share weight, and a flag indicating whether the component was trimmed in that month and on which tail. Users can replicate the trimmed-mean rate by collapsing this tibble to kept (non-trimmed) components each month and taking the weight-renormalized weighted mean of price changes.
getDallasTrimPCE( frequency = "M", NIPA_data = NULL, alpha = 0.24, beta = 0.31, components = NULL )getDallasTrimPCE( frequency = "M", NIPA_data = NULL, alpha = 0.24, beta = 0.31, components = NULL )
frequency |
Character. Frequency code passed to
|
NIPA_data |
Optional pre-loaded NIPA tibble from
|
alpha |
Numeric in [0, 1]. Lower-tail trim share. Default
|
beta |
Numeric in [0, 1]. Upper-tail trim share. Default
|
components |
Optional override for the component dictionary. Must
be a tibble with columns |
Weights are Fisher-style: an unweighted average of the expenditure share
evaluated at base prices P[t-1] with quantities Q[t-1] and Q[t],
renormalized to sum to 1 within each month. Real quantity is computed as
nominal / price from BEA NIPA Tables 2.4.5U and 2.4.4U respectively
(equivalent to Table 2.4.6U per the Dallas Fed's MATLAB note, but
available across the full sample without chained-dollar gaps).
Trim assignment is the simple rank-based version: components are sorted
within each month by price_change, cumulative weight is accumulated,
and components whose running cumulative weight is below alpha are
flagged "lower", while components whose cumulative weight before
adding their own contribution is at or above 1 - beta are flagged
"upper". Boundary components that straddle either threshold are
kept (treated as interior). The Dallas Fed's exact fractional-boundary
handling enters the rate calculation itself, not this panel-builder;
the resulting headline rate matches the Dallas Fed series to within a
few basis points.
Months without full cross-sectional coverage (i.e., any component
missing this month or last) have weight, is_trimmed, and
trim_side set to NA.
A tbl_df with one row per (date, component) and columns:
Month observation date.
Component ordinal in the Dallas tech notes (1..178, 94 omitted).
Dallas Fed component name.
BEA NIPA series code (Table 2.4.4U).
BEA NIPA Table 2.4.4U line number.
Fisher price index (Table 2.4.4U).
Current-dollar outlay (Table 2.4.5U).
Real quantity (nominal / price).
Period-over-period fractional change in price. NA for the first observation per component.
Fisher (t-1, t) expenditure-share weight, renormalized to sum to 1 within each full-coverage month. NA otherwise.
Logical. TRUE if the component is in either tail this month and so dropped from the trimmed mean. NA when the month lacks full coverage.
Character. "lower" or "upper" when trimmed; NA when kept (interior) or coverage incomplete.
Dolmas, J. (2005). "Trimmed Mean PCE Inflation." Federal Reserve Bank of Dallas Working Paper 0506.
Dolmas, J. (2009, updated 2022-12-23). "PCE Inflation: Technical Note." Federal Reserve Bank of Dallas.
Atkinson, T., Dolmas, J., & Zarutskie, R. (2026). "Skewness warrants caution as Trimmed Mean PCE inflation eases." Federal Reserve Bank of Dallas, April 16, 2026.
dallasTrimPCEcomponents, getNIPAFiles
# Default 24/31 Dallas Fed trim panel <- getDallasTrimPCE() # Replicate the monthly trimmed-mean rate (kept components, renormalized # to sum to 1): library(dplyr) tm_rate <- panel |> dplyr::filter(!is_trimmed) |> dplyr::group_by(date) |> dplyr::summarize( rate = sum(price_change * weight) / sum(weight), .groups = "drop" ) # What got trimmed in the latest month, by tail and weight: panel |> dplyr::filter(date == max(date), is_trimmed) |> dplyr::arrange(trim_side, dplyr::desc(weight)) |> dplyr::select(name, trim_side, weight, price_change)# Default 24/31 Dallas Fed trim panel <- getDallasTrimPCE() # Replicate the monthly trimmed-mean rate (kept components, renormalized # to sum to 1): library(dplyr) tm_rate <- panel |> dplyr::filter(!is_trimmed) |> dplyr::group_by(date) |> dplyr::summarize( rate = sum(price_change * weight) / sum(weight), .groups = "drop" ) # What got trimmed in the latest month, by tail and weight: panel |> dplyr::filter(date == max(date), is_trimmed) |> dplyr::arrange(trim_side, dplyr::desc(weight)) |> dplyr::select(name, trim_side, weight, price_change)
A flexible wrapper that downloads one or more data series from the St. Louis
Fed (FRED) API, optionally computes one-period percentage changes, and merges
them into a tidy tibble keyed by date.
getFRED(..., keep_all = TRUE, rename_variables = NULL, lagged = NULL)getFRED(..., keep_all = TRUE, rename_variables = NULL, lagged = NULL)
... |
One or more FRED series IDs. Each element may be either
You may also pass a single character vector (named or unnamed) for compatibility with older code. |
keep_all |
Logical. |
rename_variables |
Optional character vector of new column names (one
per series), retained for backward compatibility. Supply either
this argument or names in |
lagged |
Logical scalar or logical vector. If |
You may supply the series in two ways:
Natural “...” style:
getFRED(unrate = "UNRATE", payroll = "PAYEMS").
Named arguments give friendly column names; unnamed arguments keep the
(lower-case) ticker as the column name.
Legacy style: pass a single (optionally named) character
vector—e.g.\ c(unrate = "UNRATE", payroll = "PAYEMS")—and/or use
the rename_variables= argument. This remains supported for
backward compatibility.
If you provide names in ... and a non-NULL
rename_variables vector, the function stops and prompts you to choose
a single naming method.
A tibble with a date column and one column per requested
series.
# New interface getFRED(unrate = "UNRATE", payroll = "PAYEMS") # Multiple unnamed series (columns become 'unrate' and 'payems') getFRED("UNRATE", "PAYEMS")# New interface getFRED(unrate = "UNRATE", payroll = "PAYEMS") # Multiple unnamed series (columns become 'unrate' and 'payems') getFRED("UNRATE", "PAYEMS")
This function downloads and processes National Income and Product Accounts (NIPA)
data files from the BEA website. It reads the necessary register files, formats the
date column, and then uses the fast stringi functions together with tidyr's unnest()
to split the combined TableId:LineNo field into separate rows and columns. Finally,
it merges the datasets.
getNIPAFiles( location = "https://apps.bea.gov/national/Release/TXT/", type = "Q" )getNIPAFiles( location = "https://apps.bea.gov/national/Release/TXT/", type = "Q" )
location |
The URL or path where the BEA files are located. Default: "https://apps.bea.gov/national/Release/TXT/". |
type |
A character string indicating the type of data to load. For example, "Q" for quarterly or "M" for monthly data. Default is "Q". |
A data frame containing the merged and formatted NIPA data.
nipadata <- getNIPAFiles(type = "Q")nipadata <- getNIPAFiles(type = "Q")
Downloads and processes BEA NIPA data to compute Personal Consumption Expenditures (PCE) price indices with weights and growth measures. This is the Federal Reserve's preferred inflation measure.
getPCEInflation(frequency = "M", NIPA_data = NULL)getPCEInflation(frequency = "M", NIPA_data = NULL)
frequency |
Character string indicating the frequency of the data.
Defaults to |
NIPA_data |
Optional data frame. If provided, it will be used as the raw NIPA dataset
instead of loading fresh data with |
The function performs the following steps:
Loads NIPA data using getNIPAFiles (or uses pre-loaded data).
Extracts total PCE from table "U20405" (series code "DPCERC").
Computes PCE component weights as the nominal consumption share (component value divided by total PCE).
Extracts quantity indices from table "U20403".
Loads price indices from table "U20404", joins weights and quantities,
and calculates period-over-period growth measures.
A tbl_df (data frame) containing the PCE data with calculated variables.
# Load monthly PCE data pce_data <- getPCEInflation("M")# Load monthly PCE data pce_data <- getPCEInflation("M")
Downloads the civilian unemployment level and labor force level from
FRED, and calculates the unemployment rate as
.
getUnrateFRED()getUnrateFRED()
A tibble with columns:
date |
Observation date |
unemploy_level |
Civilian unemployment level (in thousands) |
lf_level |
Civilian labor force level (in thousands) |
full_unrate |
Unemployment rate (decimal) |
getUnrateFRED()getUnrateFRED()
Fits a log-linear trend log(value) ~ t on a calibration window and
projects it for rows on/after start_date. Designed for use inside
dplyr verbs (no need to pass .).
logLinearProjection( date, value, start_date, end_date, group = NULL, data = NULL )logLinearProjection( date, value, start_date, end_date, group = NULL, data = NULL )
date |
Bare column name for the date variable (coercible to Date). |
value |
Bare column name for the positive numeric series to project. |
start_date |
Date or string coercible to Date; start of calibration. |
end_date |
Date or string coercible to Date; end of calibration. |
group |
Optional bare column name to group by before projecting. |
data |
Optional data frame. If omitted, uses the current data mask
(e.g., inside |
A numeric vector projection aligned to the input rows; NA
before start_date. Respects grouping if group is supplied.