SOCKS: Statistical Outlier Curation Kernel Software A Noise Reduction Software for Machine Learning Written by Gitika Gorthi, Chantilly High School Have you ever had your classmates' voices overpower your teacher’s, making it hard for you to listen to the important instructions being given? Or have you ever had a tough time understanding news on the radio because of the heavy static sounds? Noise in data sets is similar to the noisy disturbances we hear in our daily lives; it is additional information that serves no apparent purpose such as in the form of data corruption. Noise in data often causes the algorithms to miss out patterns or specific trends in the data, similarly to how we can miss important instructions or news due to the background disturbances. The study “Dealing With Noise In Defect Prediction” has determined that false positive and false negative noises alone can lead to a 20-35% decrease in prediction performance (Kim et al, 2011). In order to reduce noise fro...