Package com.groiss.ml.ds
Interface DataSet
public interface DataSet
Interface representing a list of instances used for learning and/or evaluation.
A data set also has to have attributes represented as
Attribute
objects.-
Method Summary
Modifier and TypeMethodDescriptionboolean
Adds a newInstance
to this data setReturns the list of all attributes (inputs and output) of this data set.Returns a data set containing only the instances of this data set which fulfill the passedfilter
.Returns anIterable
with this data set as its source.boolean
isEmpty()
Returns true if this data set contains no instances.<T> Attribute<T>
Get the output attribute of the attributes of thisDataSet
, i.e. the attribute which predictions shall be learned.boolean
Removes anInstance
from this data setvoid
Randomly permute the entries of this data set using the specified source of randomness.int
size()
Returns the number of instances in this data set.split
(double fraction) Splits the set of instances into a set for training and a set for testing according to the passed fraction.split
(int numFolds, int testIndex) Splits the set of instances into a set for training and a set for testing according to the passed number of folds and the index determining which one shall be used for testing.stream()
Returns a sequentialStream
with this data set as its source.
-
Method Details
-
outputAttribute
Get the output attribute of the attributes of thisDataSet
, i.e. the attribute which predictions shall be learned.- Returns:
- the
Attribute
representing the output attribute of this data set
-
attributes
Returns the list of all attributes (inputs and output) of this data set. -
add
Adds a newInstance
to this data set- Parameters:
instance
- the instance to add- Returns:
- true if this data set changed as a result of the call
-
remove
Removes anInstance
from this data set- Parameters:
instance
- the instance to remove- Returns:
- true if the instance was removed, false otherwise
-
size
int size()Returns the number of instances in this data set.- Returns:
- the number of instances in this data set
-
isEmpty
boolean isEmpty()Returns true if this data set contains no instances.- Returns:
- true if this data set contains no instances
-
instances
Returns anIterable
with this data set as its source.- Returns:
- an
Iterable
ofInstance
s
-
stream
Returns a sequentialStream
with this data set as its source.- Returns:
- a
Stream
ofInstance
s
-
shuffle
Randomly permute the entries of this data set using the specified source of randomness.- Parameters:
random
- the source of randomness to use to shuffle the entries
-
split
Splits the set of instances into a set for training and a set for testing according to the passed fraction.- Parameters:
fraction
- the fraction for the training set, the rest will be add to the test set.- Returns:
- a
TrainTestSets
of train and test data
-
split
Splits the set of instances into a set for training and a set for testing according to the passed number of folds and the index determining which one shall be used for testing.- Parameters:
numFolds
- number of folds to apply to data settestIndex
- represents which part is used for testing- Returns:
- a
TrainTestSets
of train and test data.
-
filtered
Returns a data set containing only the instances of this data set which fulfill the passedfilter
.- Parameters:
filter
- the predicate which must be fulfilled by a given instance to be part of the result- Returns:
- a filtered data set
-