If we only have a few unique values (i.e. the feature variable has near-zero variance) then the percentUnique value will be small. This is good news, and means that we don’t have an unbalanced data set where one value is being recorded significantly more frequently than other values.įinally, if we check the percentUnique column, we see the number of unique values recorded for each variable, divided by the total number of samples, and expressed as a percentage. If we check this column, we see that all feature variables have a freqRatio value close to 1. The freqRatio column computes the frequency of the most prevalent value recorded for that variable, divided by the frequency of the second most prevalent value. Here, we can see that as identified previously, none of the variables have zero or near zero variance (as shown in columns 3 and 4 of the output). NearZeroVar(ml_penguins_updated, saveMetrics = T) # freqRatio percentUnique zeroVar nzv Notice that in the first row, we have a value of 0 for sex.female and a value of 1 for sex.male - in other words, the data in the first row is for a male penguin. Now, instead of sex taking the values of female or male, this variable has been replaced by the dummy variables sex.female and sex.male. This is mainly because we would like to include the species variable with the labels Adelie, Chinstrap and Gentoo, rather than the numbers 1,2 and 3. Note: We use the as_tibble function from the tibble package to restructure our data following the introduction of the dummyVars dummy variables. , data = ml_penguins) ml_penguins_updated <- as_tibble( predict(dummy_penguins, newdata = ml_penguins)) # remember to include the outcome variable too ml_penguins_updated <- cbind( species = ml_penguins $species, ml_penguins_updated) head(ml_penguins_updated) # species bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex.female Library(tibble) dummy_penguins <- dummyVars(species ~.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |