A wrapper function which treats missing values (removes, imuptes, etc.) found in a data set. This function is set-up to handle data geared towards univariate analysis (e.g. a single response predicted by multiple covariates).

na_treatment(
  data_df,
  na_thresh,
  treatment_type,
  id_var,
  response_var,
  covariate_vec,
  random_seed,
  verbose = FALSE
)

Arguments

data_df

A class data.frame object to treat any missing values (e.g. NAs)

na_thresh

Specify a proportion between 0 and 1. Any covariate with the proportion of missing data greater than this threshold value will simply be excluded.

treatment_type

Specifies how the missing values are treated.

'omit'

Missing values are omitted from the data set.

'central_tendency'

Missing values are replaced with the median and modal values for continuous and discrete covariates, respectively.

class(list)

A list either with entries (type = "resample", "random_seed" = i) or (type = "impute", ntree = n). For type resample, missing values are replaced with randomly re-sampled observations where i sets the random seed. For type impute, the missRanger package is used to impute missing values using a random forest machine learning algorithm.

id_var

The column name of 'data_df' containing the observation id or row id.

response_var

The column name of 'data_df' containing the response variable. This column is not treated for missing values.

covariate_vec

The column names of covariates to treat for missing values.

random_seed

Sets a random seed.

verbose

Defaults to FALSE. Print the report?

Value

A named list is returned.

data

A class data.frame object without any missing values.

report

A text report of the processing that occurred.