One principal goal of descriptive statistics is to represent the essence of a large data set concisely. Octave provides the mean, median, and mode functions which all summarize a data set with just a single number corresponding to the central tendency of the data.
m = mean (x) ¶m = mean (x, dim) ¶m = mean (x, vecdim) ¶m = mean (x, "all") ¶m = mean (…, nanflag) ¶m = mean (…, outtype) ¶m = mean (…, 'Weights', w) ¶Compute the mean of the elements of x.
The mean is defined as
mean (x) = SUM_i x(i) / N
where N is the number of elements in x.
The weighted mean is defined as
weighted_mean (x) = SUM_i (w(i) * x(i)) / SUM_i (w(i))
where N is the number of elements in x.
If x is a vector, then mean (x) returns the mean of the
elements in x.
If x is a matrix, then mean (x) returns a row vector with
each element containing the mean of the corresponding column in x.
If x is an array, then mean (x) computes the mean along
the first non-singleton dimension of x.
The optional input dim specifies the dimension to operate on and must
be a positive integer. Specifying any singleton dimension of x,
including any dimension exceeding ndims (x), will return
x.
Specifying multiple dimensions with input vecdim, a vector of
non-repeating dimensions, will operate along the array slice defined by
vecdim. If vecdim indexes all dimensions of x, then it is
equivalent to the option "all". Any dimension in vecdim
greater than ndims (x) is ignored.
Specifying the dimension as "all" will cause mean to operate
on all elements of x, and is equivalent to mean (x(:)).
The optional input outtype specifies the data type that is returned. outtype can take the following values:
'default' : Output is of type double, unless the input issingle in which case the output is of type single.
'double' : Output is of type double.'native' : Output is of the same type as the input as reportedby (class (x)), unless the input is logical in which case the
output is of type double.
The optional variable nanflag specifies whether to include or exclude
NaN values from the calculation using any of the previously specified input
argument combinations. The default value for nanflag is
"includenan" which keeps NaN values in the calculation. To exclude
NaN values set the value of nanflag to "omitnan". The output
will still contain NaN values if x consists of all NaN values in the
operating dimension.
The optional argument pair "Weights", w specifies a weighting
scheme w, which is applied on input x, so that mean
computes the weighted mean. When operating along a single dimension,
w must be a vector of the same length as the operating dimension or it
must have the same size as x. When operating over an array slice
defined by vecdim, w must have the same size as the operating
array slice, i.e., size (w) == size (x)(vecdim), or the
same size as x.
m = median (x) ¶m = median (x, dim) ¶m = median (x, vecdim) ¶m = median (x, "all") ¶m = median (…, nanflag) ¶m = median (…, outtype) ¶Compute the median value of the elements of x.
The median is defined on the sorted data s
(s = sort (x)) as
| s(ceil (N/2)) N odd
median (x) = |
| (s(N/2) + s(N/2+1))/2 N even
If x is a vector, then median (x) returns the median of
the elements in x.
If x is a matrix, then median (x) returns a row vector
with each element containing the median of the corresponding column in
x.
If x is an array, then median (x) computes the median
along the first non-singleton dimension of x.
The optional input dim specifies the dimension to operate on and must
be a positive integer. Specifying any singleton dimension of x,
including any dimension exceeding ndims (x), will return
x.
Specifying multiple dimensions with input vecdim, a vector of
non-repeating dimensions, will operate along the array slice defined by
vecdim. If vecdim indexes all dimensions of x, then it is
equivalent to the option "all". Any dimension in vecdim
greater than ndims (x) is ignored.
Specifying the dimension as "all" will cause median to
operate on all elements of x, and is equivalent to
median (x(:)).
median (…, outtype) returns the median with a specified
data type, using any of the input arguments in the previous syntaxes.
outtype can take the following values:
"default"Output is of type double, unless the input is single in which case the output is of type single.
"double"Output is of type double.
"native".Output is of the same type as the input (class (x)), unless the
input is logical in which case the output is of type double.
The optional variable nanflag specifies whether to include or exclude
NaN values from the calculation using any of the previously specified input
argument combinations. The default value for nanflag is
"includenan" which keeps NaN values in the calculation. To
exclude NaN values set the value of nanflag to "omitnan".
The output will still contain NaN values if x consists of all NaN
values in the operating dimension.
m = mode (x) ¶m = mode (x, dim) ¶m = mode (x, vecdim) ¶m = mode (x, "all") ¶[m, f, c] = mode (…) ¶Compute the most frequently occurring value in the input data x.
mode determines the frequency of values along the first non-singleton
dimension and returns the value with the highest frequency. If two, or
more, values have the same frequency mode returns the smallest.
The optional input dim specifies the dimension to operate on and must
be a positive integer. Specifying any singleton dimension of x,
including any dimension exceeding ndims (x), will return
x.
Specifying multiple dimensions with input vecdim, a vector of
non-repeating dimensions, will operate along the array slice defined by
vecdim. If vecdim indexes all dimensions of x, then it is
equivalent to the option "all". Any dimension in vecdim
greater than ndims (x) is ignored. If all dimensions in
vecdim are greater than ndims (x), then mode
will return x.
Specifying the dimension as "all" will cause mode to operate
on all elements of x, and is equivalent to mode (x(:)).
The return variable f is the number of occurrences of the mode in the dataset.
The cell array c contains all of the elements with the maximum frequency.
Using just one number, such as the mean, to represent an entire data set may not give an accurate picture of the data. One way to characterize the fit is to measure the dispersion of the data. Octave provides several functions for measuring dispersion.
[s, l] = bounds (x) ¶[s, l] = bounds (x, dim) ¶[s, l] = bounds (x, vecdim) ¶[s, l] = bounds (x, "all") ¶[s, l] = bounds (…, nanflag) ¶Return the smallest and largest values of the input data x.
If x is a vector, then bounds (x) returns the smallest
and largest values of the elements in x in s and l,
respectively.
If x is a matrix, then bounds (x) returns the smallest
and largest values for each column of x as row vectors s and
l, respectively.
If x is an array, then bounds (x) computes the smallest
and largest values along the first non-singleton dimension of x.
The data in x must be numeric. By default, any NaN values are ignored. The size of s and l is equal to the size of x except for the operating dimension, which becomes 1.
The optional input dim specifies the dimension to operate on and must
be a positive integer. Specifying any singleton dimension of x,
including any dimension exceeding ndims (x), will return
x.
Specifying multiple dimensions with input vecdim, a vector of
non-repeating dimensions, will operate along the array slice defined by
vecdim. If vecdim indexes all dimensions of x, then it is
equivalent to the option "all". Any dimension in vecdim
greater than ndims (x) is ignored.
Specifying the dimension as "all" will cause bounds to
operate on all elements of x, and is equivalent to
bounds (x(:)).
The optional variable nanflag specifies whether to include or exclude
NaN values from the calculation using any of the previously specified input
argument combinations. The default value for nanflag is
"omitnan" which does not include NaN values in the result. If the
argument "includenan" is given, and there is a NaN present, then the
result for both smallest (s) and largest (l) elements will be
NaN.
Usage Note: The bounds are a quickly computed measure of the dispersion of a
data set, but are less accurate than iqr if there are outlying data
points.
y = range (x) ¶y = range (x, dim) ¶y = range (x, vecdim) ¶y = range (x, "all") ¶y = range (…, nanflag) ¶Return the difference between the maximum and the minimum values of the input data x.
If x is a vector, then range (x) returns the difference
between the maximum and minimum values of the elements in x.
If x is a matrix, then range (x) returns a row vector
y with the difference between the maximum and minimum values for each
column of x.
If x is an array, then range (x) computes the difference
between the maximum and minimum values along the first non-singleton
dimension of x.
The data in x must be numeric. By default, any NaN values are ignored. The size of r is equal to the size of x except for the operating dimension, which becomes 1.
The optional input dim specifies the dimension to operate on and must
be a positive integer. Specifying any singleton dimension of x,
including any dimension exceeding ndims (x), will return
x.
Specifying multiple dimensions with input vecdim, a vector of
non-repeating dimensions, will operate along the array slice defined by
vecdim. If vecdim indexes all dimensions of x, then it is
equivalent to the option "all". Any dimension in vecdim
greater than ndims (x) is ignored.
Specifying the dimension as "all" will cause range to operate
on all elements of x, and is equivalent to range (x(:)).
The optional variable nanflag specifies whether to include or exclude
NaN values from the calculation using any of the previously specified input
argument combinations. The default value for nanflag is
"omitnan" which does not include NaN values in the result. If the
argument "includenan" is given, and there is a NaN present, then the
corresponding result will be NaN.
Usage Note: The range is a quickly computed measure of the dispersion of a
data set, but is less accurate than iqr if there are outlying data
points.
r = iqr (x) ¶r = iqr (x, dim) ¶r = iqr (x, vecdim) ¶r = iqr (x, "all") ¶[r, q] = iqr (…) ¶Compute the interquartile range of the input data x.
The interquartile range is defined as the difference between the 75th and 25th percentile values of x calculated using
quantile (x, [0.25 , 0.75])
If x is a vector, then iqr (x) computes the interquartile
range of the elements in x.
If x is a matrix, then iqr (x) returns a row vector with
each element containing the interquartile range of the corresponding column
in x.
If x is an array, then iqr (x) computes the interquartile
range along the first non-singleton dimension of x.
The data in x must be numeric and any NaN values are ignored. The size of r is equal to the size of x except for the operating dimension, which becomes 1.
The optional input dim specifies the dimension to operate on and must
be a positive integer. Specifying any singleton dimension of x,
including any dimension exceeding ndims (x), will return
zeros (size (x)).
Specifying multiple dimensions with input vecdim, a vector of
non-repeating dimensions, will operate along the array slice defined by
vecdim. If vecdim indexes all dimensions of x, then it is
equivalent to the option "all". Any dimension in vecdim
greater than ndims (x) is ignored.
Specifying the dimension as "all" will cause iqr to operate
on all elements of x, and is equivalent to iqr (x(:)).
The optional output q contains the quantiles for the 25th and 75th percentile of the data.
Usage Note: As a measure of dispersion, the interquartile range is less
affected by outliers than either range or std. The
interquartile range of a scalar is necessarily 0.
m = mad (x) ¶m = mad (x, opt) ¶m = mad (x, opt, dim) ¶m = mad (x, opt, vecdim) ¶m = mad (x, opt, "all") ¶Compute the mean or median absolute deviation (MAD) of the elements of x.
The mean absolute deviation is defined as
mad = mean (abs (x - mean (x)))
The median absolute deviation is defined as
mad = median (abs (x - median (x)))
mad excludes NaN values from calculation similar to using the
omitnan option in mean and median.
If x is a vector, then mad (x) returns the mean absolute
deviation of the elements in x.
If x is a matrix, then mad (x) returns a row vector with
each element containing the mean absolute deviation of the corresponding
column in x.
If x is an array, then mad (x) computes the mean absolute
deviation along the first non-singleton dimension of x.
The optional argument opt specifies whether mean or median absolute
deviation is calculated. The default is 0 which corresponds to mean
absolute deviation; a value of 1 corresponds to median absolute
deviation. Passing an empty input [] defaults to mean absolute
deviation.
The optional input dim specifies the dimension to operate on and must
be a positive integer. Specifying any singleton dimension of x,
including any dimension exceeding ndims (x), will return
zeros (size (x)).
Specifying the dimension as vecdim, a vector of non-repeating
dimensions, will return the mad over the array slice defined by
vecdim. If vecdim indexes all dimensions of x, then it is
equivalent to the option "all". Any dimension in vecdim
greater than ndims (x) is ignored.
Specifying the dimension as "all" will cause mad to operate
on all elements of x, and is equivalent to mad (x(:)).
Usage Note: As a measure of dispersion, mad is less affected by
outliers than std.
y = meansq (x) ¶y = meansq (x, dim) ¶y = meansq (x, vecdim) ¶y = meansq (x, "all") ¶y = meansq (…, nanflag) ¶Compute the mean square of the input data x.
The mean square is defined as
meansq (x) = 1/N SUM_i x(i)^2
where N is the length of the x vector.
If x is a vector, then meansq (x) returns the mean square
of the elements in x.
If x is a matrix, then meansq (x) returns a row vector
with each element containing the mean square of the corresponding column in
x.
If x is an array, then meansq (x) computes the mean
square along the first non-singleton dimension of x.
The data in x must be numeric. The size of y is equal to the size of x except for the operating dimension, which becomes 1.
The optional input dim specifies the dimension to operate on and must
be a positive integer. Specifying any singleton dimension of x,
including any dimension exceeding ndims (x), will return
x.^2.
Specifying multiple dimensions with input vecdim, a vector of
non-repeating dimensions, will operate along the array slice defined by
vecdim. If vecdim indexes all dimensions of x, then it is
equivalent to the option "all". Any dimension in vecdim
greater than ndims (x) is ignored.
Specifying the dimension as "all" will cause meansq to
operate on all elements of x, and is equivalent to
meansq (x(:)).
The optional variable nanflag specifies whether to include or exclude
NaN values from the calculation using any of the previously specified input
argument combinations. The default value for nanflag is
"includenan" which keeps NaN values in the calculation. To exclude
NaN values set the value of nanflag to "omitnan". The output
will still contain NaN values if x consists of all NaN values in the
operating dimension.
y = rms (x) ¶y = rms (x, dim) ¶y = rms (x, vecdim) ¶y = rms (x, "all") ¶y = rms (…, nanflag) ¶Compute the root mean square of the input data x.
The root mean square is defined as
rms (x) = sqrt (1/N SUM_i x(i)^2)
where N is the length of the x vector.
If x is a vector, then rms (x) returns the root mean
square of the elements in x.
If x is a matrix, then rms (x) returns a row vector
with each element containing the root mean square of the corresponding
column in x.
If x is an array, then rms (x) computes the root mean
square along the first non-singleton dimension of x.
The data in x must be numeric. The size of y is equal to the size of x except for the operating dimension, which becomes 1.
The optional input dim specifies the dimension to operate on and must
be a positive integer. Specifying any singleton dimension of x,
including any dimension exceeding ndims (x), will return
x.
Specifying multiple dimensions with input vecdim, a vector of
non-repeating dimensions, will operate along the array slice defined by
vecdim. If vecdim indexes all dimensions of x, then it is
equivalent to the option "all". Any dimension in vecdim
greater than ndims (x) is ignored.
Specifying the dimension as "all" will cause rms to operate
on all elements of x, and is equivalent to rms (x(:)).
The optional variable nanflag specifies whether to include or exclude
NaN values from the calculation using any of the previously specified input
argument combinations. The default value for nanflag is
"includenan" which keeps NaN values in the calculation. To exclude
NaN values set the value of nanflag to "omitnan". The output
will still contain NaN values if x consists of all NaN values in the
operating dimension.
s = std (x) ¶s = std (x, w) ¶s = std (x, w, dim) ¶s = std (x, w, vecdim) ¶s = std (x, w, "all") ¶s = std (…, nanflag) ¶[s, m] = std (…) ¶Compute the standard deviation of the elements of x.
The standard deviation is defined as
std (x) = sqrt ((1 / (N-1)) * SUM_i ((x(i) - mean(x))^2))
where N is the number of elements of x.
If x is a vector, then std (x) returns the standard
deviation of the elements in x.
If x is a matrix, then std (x) returns a row vector with
each element containing the standard deviation of the corresponding column
in x.
If x is an array, then std (x) computes the standard
deviation along the first non-singleton dimension of x.
The optional argument w determines the weighting scheme to use. Valid values are:
Normalize with N-1 (population standard deviation). This provides the square root of the best unbiased estimator of the standard deviation.
Normalize with N (sample standard deviation). This provides the square root of the second moment around the mean.
Compute the weighted standard deviation with non-negative weights. The length of w must equal the size of x in the operating dimension. NaN values are permitted in w, will be multiplied with the associated values in x, and can be excluded by the nanflag option.
Similar to vector weights, but w must be the same size as x. If
the operating dimension is supplied as vecdim or "all" and
w is not a scalar, w must be an same-sized array.
Note: w must always be specified before specifying any of the
following dimension options. To use the default value for w you may
pass an empty input argument [].
The optional input dim specifies the dimension to operate on and must
be a positive integer. Specifying any singleton dimension of x,
including any dimension exceeding ndims (x), will return
zeros (size (x)).
Specifying multiple dimensions with input vecdim, a vector of
non-repeating dimensions, will operate along the array slice defined by
vecdim. If vecdim indexes all dimensions of x, then it is
equivalent to the option "all". Any dimension in vecdim
greater than ndims (x) is ignored.
Specifying the dimension as "all" will cause std to operate
on all elements of x, and is equivalent to std (x(:)).
The optional variable nanflag specifies whether to include or exclude
NaN values from the calculation using any of the previously specified input
argument combinations. The default value for nanflag is
"includenan" which keeps NaN values in the calculation. To
exclude NaN values set the value of nanflag to "omitnan".
The output will still contain NaN values if x consists of all NaN
values in the operating dimension.
The optional second output variable m contains the mean of the elements of x used to calculate the standard deviation. If v is the weighted standard deviation, then m is also the weighted mean.
In addition to knowing the size of a dispersion it is useful to know the shape of the data set. For example, are data points massed to the left or right of the mean? Octave provides several common measures to describe the shape of the data set. Octave can also calculate moments allowing arbitrary shape measures to be developed.
v = var (x) ¶v = var (x, w) ¶v = var (x, w, dim) ¶v = var (x, w, vecdim) ¶v = var (x, w, "all") ¶v = var (…, nanflag) ¶[v, m] = var (…) ¶Compute the variance of the elements of x.
The variance is defined as
var (x) = (1 / (N-1)) * SUM_i ((x(i) - mean(x))^2)
where N is the number of elements of x.
If x is a vector, then var (x) returns the variance of
the elements in x.
If x is a matrix, then var (x) returns a row vector with
each element containing the variance of the corresponding column in x.
If x is an array, then var (x) computes the variance
along the first non-singleton dimension of x.
The optional argument w determines the weighting scheme to use. Valid values are:
Normalize with N-1 (population variance). This provides the square root of the best unbiased estimator of the variance.
Normalize with N (sample variance). This provides the square root of the second moment around the mean.
Compute the weighted variance with non-negative weights. The length of w must equal the size of x in the operating dimension. NaN values are permitted in w, will be multiplied with the associated values in x, and can be excluded by the nanflag option.
Similar to vector weights, but w must be the same size as x. If
the operating dimension is supplied as vecdim or "all" and
w is not a scalar, then w must match the size of the specified
array slice.
Note: w must always be specified before specifying any of the
following dimension options. To use the default value for w you
may pass an empty input argument [].
The optional input dim specifies the dimension to operate on and must
be a positive integer. Specifying any singleton dimension of x,
including any dimension exceeding ndims (x), will return
zeros (size (x)).
Specifying multiple dimensions with input vecdim, a vector of
non-repeating dimensions, will operate along the array slice defined by
vecdim. If vecdim indexes all dimensions of x, then it is
equivalent to the option "all". Any dimension in vecdim
greater than ndims (x) is ignored.
Specifying the dimension as "all" will cause var to operate
on all elements of x, and is equivalent to var (x(:)).
The optional variable nanflag specifies whether to include or exclude
NaN values from the calculation using any of the previously specified input
argument combinations. The default value for nanflag is
"includenan" which keeps NaN values in the calculation. To
exclude NaN values set the value of nanflag to "omitnan".
The output will still contain NaN values if x consists of all NaN
values in the operating dimension.
The optional second output variable m contains the mean of the elements of x used to calculate the variance. If v is the weighted variance, then m is also the weighted mean.
y = skewness (x) ¶y = skewness (x, flag) ¶y = skewness (x, flag, dim) ¶y = skewness (x, flag, vecdim) ¶y = skewness (x, flag, "all") ¶Compute the sample skewness of the input data x.
The sample skewness is defined as
mean ((x - mean (x)).^3)
skewness (X) = ------------------------.
std (x).^3
The optional argument flag controls which normalization is used. If flag is equal to 1 (default value, used when flag is omitted or empty), return the sample skewness as defined above. If flag is equal to 0, return the adjusted skewness coefficient instead:
sqrt (N*(N-1)) mean ((x - mean (x)).^3)
skewness (X, 0) = -------------- * ------------------------.
(N - 2) std (x).^3
where N is the length of the x vector.
The adjusted skewness coefficient is obtained by replacing the sample second and third central moments by their bias-corrected versions.
If x is a vector, then skewness (x) computes the skewness
of the data in x.
If x is a matrix, then skewness (x) returns a row vector
with each element containing the skewness of the data of the corresponding
column in x.
If x is an array, then skewness (x) computes the skewness
of the data along the first non-singleton dimension of x.
The data in x must be numeric and any NaN values are ignored. The size of y is equal to the size of x except for the operating dimension, which becomes 1.
The optional input dim specifies the dimension to operate on and must
be a positive integer. Specifying any singleton dimension of x,
including any dimension exceeding ndims (x), will return
x.
Specifying multiple dimensions with input vecdim, a vector of
non-repeating dimensions, will operate along the array slice defined by
vecdim. If vecdim indexes all dimensions of x, then it is
equivalent to the option "all". Any dimension in vecdim
greater than ndims (x) is ignored.
Specifying the dimension as "all" will cause skewness to
operate on all elements of x, and is equivalent to
skewness (x(:)).
y = kurtosis (x) ¶y = kurtosis (x, flag) ¶y = kurtosis (x, flag, dim) ¶y = kurtosis (x, flag, vecdim) ¶y = kurtosis (x, flag, "all") ¶Compute the sample kurtosis of the input data x.
The sample kurtosis is defined as
mean ((x - mean (x)).^4)
k1 = ------------------------
std (x).^4
The optional argument flag controls which normalization is used. If flag is equal to 1 (default value, used when flag is omitted or empty), return the sample kurtosis as defined above. If flag is equal to 0, return the "bias-corrected" kurtosis coefficient instead:
N - 1
k0 = 3 + -------------- * ((N + 1) * k1 - 3 * (N - 1))
(N - 2)(N - 3)
where N is the length of the x vector.
The bias-corrected kurtosis coefficient is obtained by replacing the sample second and fourth central moments by their unbiased versions. It is an unbiased estimate of the population kurtosis for normal populations.
If x is a vector, then kurtosis (x) computes the kurtosis
of the data in x.
If x is a matrix, then kurtosis (x) returns a row vector
with each element containing the kurtosis of the data of the corresponding
column in x.
If x is an array, then kurtosis (x) computes the kurtosis
of the data along the first non-singleton dimension of x.
The data in x must be numeric and any NaN values are ignored. The size of y is equal to the size of x except for the operating dimension, which becomes 1.
The optional input dim specifies the dimension to operate on and must
be a positive integer. Specifying any singleton dimension of x,
including any dimension exceeding ndims (x), will return
x.
Specifying multiple dimensions with input vecdim, a vector of
non-repeating dimensions, will operate along the array slice defined by
vecdim. If vecdim indexes all dimensions of x, then it is
equivalent to the option "all". Any dimension in vecdim
greater than ndims (x) is ignored.
Specifying the dimension as "all" will cause kurtosis to
operate on all elements of x, and is equivalent to
kurtosis (x(:)).
m = moment (x, p) ¶m = moment (x, p, dim) ¶m = moment (x, p, vecdim) ¶m = moment (x, p, "all") ¶m = moment (x, p, …, type) ¶Compute the p-th central moment of the input data x.
The p-th central moment of x is defined as:
1/N SUM_i (x(i) - mean(x))^p
where N is the length of the x vector.
If x is a vector, then moment (x) computes the p-th
central moment of the data in x.
If x is a matrix, then moment (x) returns a vector with
element containing the p-th central moment of the corresponding column
in x.
If x is an array, then moment (x) computes the p-th
central moment along the first non-singleton dimension of x.
The data in x must be a non-empty numeric array and any NaN values along the operating dimension will return NaN for central moment. The size of m is equal to the size of x except for the operating dimension, which becomes 1.
The optional input dim specifies the dimension to operate on and must
be a positive integer. Specifying any singleton dimension of x,
including any dimension exceeding ndims (x), will return
x.
Specifying multiple dimensions with input vecdim, a vector of
non-repeating dimensions, will operate along the array slice defined by
vecdim. If vecdim indexes all dimensions of x, then it is
equivalent to the option "all". Any dimension in vecdim
greater than ndims (x) is ignored. If all dimensions in
vecdim are greater than ndims (x), then moment
will return x.
Specifying the dimension as "all" will cause moment to
operate on all elements of x, and is equivalent to
moment (x(:)).
The optional fourth input argument, type, is a string specifying the type of moment to be computed. Valid options are:
"c"Central Moment (default).
"a""ac"Absolute Central Moment. The moment about the mean ignoring sign defined as
1/N SUM_i (abs (x(i) - mean(x)))^p
"r"Raw Moment. The moment about zero defined as
moment (x) = 1/N SUM_i x(i)^p
"ar"Absolute Raw Moment. The moment about zero ignoring sign defined as
1/N SUM_i ( abs (x(i)) )^p
q = quantile (x) ¶q = quantile (x, p) ¶q = quantile (x, n) ¶q = quantile (x, …, dim) ¶q = quantile (x, …, vecdim) ¶q = quantile (x, …, "all") ¶q = quantile (x, p, …, method) ¶q = quantile (x, n, …, method) ¶Compute the quantiles of the input data x.
If x is a vector, then quantile (x) computes the quantiles
specified by p of the data in x.
If x is a matrix, then quantile (x) returns a matrix such
that the i-th row of q contains the p(i)th quantiles of each
column of x.
If x is an array, then quantile (x) computes the quantiles
specified by p along the first non-singleton dimension of x.
The data in x must be numeric and any NaN values are ignored. The size of q is equal to the size of x except for the operating dimension, which equals to the number of quantiles specified by p or n.
p is a numeric vector specifying the percentiles to be computed, which
correspond to the cumulative probabilities of the data . All elements of
p must be in the range from 0 to 1. If p is unspecified, return
the percentiles for [0.00 0.25 0.50 0.75 1.00]. Alternatively, the
second input argument may be specified as a positive integer value n,
in which case quantile returns the quantiles for n evenly
spaced cumulative probabilities computed as (1/(n + 1), 2/(n
+ 1), …, n/(n + 1)) for n > 1.
The optional input dim specifies the dimension to operate on and must
be a positive integer. Specifying any singleton dimension of x,
including any dimension exceeding ndims (x), will return N
copies of x along the operating dimension, where N is the number of
specified quantiles.
Specifying multiple dimensions with input vecdim, a vector of
non-repeating dimensions, will operate along the array slice defined by
vecdim. If vecdim indexes all dimensions of x, then it is
equivalent to the option "all". Any dimension in vecdim
greater than ndims (x) is ignored. If all dimensions in
vecdim are greater than ndims (x), then quantile
will return N copies of x along the smallest dimension in
vecdim.
Specifying the dimension as "all" will cause iqr to operate
on all elements of x, and is equivalent to iqr (x(:)).
The fourth input argument, methods, determines the method to calculate the quantiles specified by p or n. The methods available to calculate sample quantiles are the nine methods used by R (https://www.r-project.org/) and can be specified by the corresponding integer value. The default value is method = 5.
Discontinuous sample quantile methods 1, 2, and 3
Continuous sample quantile methods 4 through 9, where p(k) is the linear interpolation function respecting each method’s representative cdf.
Hyndman and Fan (1996) recommend method 8. Maxima, S, and R (versions prior to 2.0.0) use 7 as their default. Minitab and SPSS use method 6. MATLAB uses method 5.
References:
Examples:
x = randi (1000, [10, 1]); # Create empirical data in range 1-1000 q = quantile (x, [0, 1]); # Return minimum, maximum of distribution q = quantile (x, [0.25 0.5 0.75]); # Return quartiles of distribution
See also: prctile.
q = prctile (x) ¶q = prctile (x, p) ¶q = prctile (x, p, dim) ¶q = prctile (x, p, vecdim) ¶q = prctile (x, p, "all") ¶q = prctile (x, p, …, method) ¶Compute the percentiles of the input data x.
If x is a vector, then prctile (x) computes the
percentiles specified by p of the data in x.
If x is a matrix, then prctile (x) returns a matrix such
that the i-th row of q contains the p(i)th percentiles of each
column of x.
If x is an array, then prctile (x) computes the
percentiles specified by p along the first non-singleton dimension of
x.
The data in x must be numeric and any NaN values are ignored. The size of q is equal to the size of x except for the operating dimension, which equals to the number of quantiles specified by p.
p is a numeric vector specifying the percentiles to be computed. All
elements of p must be in the range from 0 to 100. If p is
unspecified, return the percentiles for [0 25 50 75 100].
The optional input dim specifies the dimension to operate on and must
be a positive integer. Specifying any singleton dimension of x,
including any dimension exceeding ndims (x), will return N
copies of x along the operating dimension, where N is the number of
specified percentiles.
Specifying multiple dimensions with input vecdim, a vector of
non-repeating dimensions, will operate along the array slice defined by
vecdim. If vecdim indexes all dimensions of x, then it is
equivalent to the option "all". Any dimension in vecdim
greater than ndims (x) is ignored. If all dimensions in
vecdim are greater than ndims (x), then quantile
will return N copies of x along the smallest dimension in
vecdim.
Specifying the dimension as "all" will cause iqr to operate
on all elements of x, and is equivalent to iqr (x(:)).
The fourth input argument, methods, determines the method to calculate the percentiles specified by p. The methods available to calculate sample percentiles are the nine methods used by R (https://www.r-project.org/) and can be specified by the corresponding integer value. The default value is method = 5.
Discontinuous sample quantile methods 1, 2, and 3
Continuous sample quantile methods 4 through 9, where p(k) is the linear interpolation function respecting each method’s representative cdf.
See also: quantile.
A summary view of a data set can be generated quickly with the
statistics function.
stats = statistics (x) ¶stats = statistics (x, dim) ¶stats = statistics (x, vecdim) ¶stats = statistics (x, "all") ¶stats = statistics (…, nanflag) ¶Return a vector with statistics parameters over the input data x.
statistics (x operates along the first non-singleton dimension
of x and calculates the following statistical parameters:
If x is a row vector, then statistics (x) returns a row
vector with the aforementioned statistical parameters. If x is a
column vector, then it returns a column vector.
If x is a matrix, then statistics (x) returns a matrix
such that each column contains the statistical parameters calculated over
the corresponding column of x.
If x is an array, then statistics (x) computes the
statistical parameters along the first non-singleton dimension of x.
The data in x must be numeric and by default any NaN values are
ignored from the computations of statistical parameters except for the
mean and the standard deviation. Set the optional argument nanflag
to "omitnan" to exclude the NaN values from the calculation of
the mean and standard deviation parameters. Setting nanflag to
"includenan" is ignored and it is equivalent to calling the
statistics function without the nanflag argument.
The size of stats is equal to the size of x except for the operating dimension, which equals to 9 (i.e., the number of statistical parameters returned).
The optional input dim specifies the dimension to operate on and must
be a positive integer. Specifying any singleton dimension of x,
including any dimension exceeding ndims (x), will return
x.
Specifying multiple dimensions with input vecdim, a vector of
non-repeating dimensions, will operate along the array slice defined by
vecdim. If vecdim indexes all dimensions of x, then it is
equivalent to the option "all". Any dimension in vecdim
greater than ndims (x) is ignored.
Specifying the dimension as "all" will cause statistics to
operate on all elements of x, and is equivalent to
statistics (x(:)).