Statistical Outlier Curation Kernel Software (SOCKS)

SOCKS: Statistical Outlier Curation Kernel Software
A Noise Reduction Software for Machine Learning

Written by Gitika Gorthi, Chantilly High School

Have you ever had your classmates' voices overpower your teacher’s, making it hard for you to listen to the important instructions being given? Or have you ever had a tough time understanding news on the radio because of the heavy static sounds? Noise in data sets is similar to the noisy disturbances we hear in our daily lives; it is additional information that serves no apparent purpose such as in the form of data corruption. Noise in data often causes the algorithms to miss out patterns or specific trends in the data, similarly to how we can miss important instructions or news due to the background disturbances. The study “Dealing With Noise In Defect Prediction” has determined that false positive and false negative noises alone can lead to a 20-35% decrease in prediction performance (Kim et al, 2011). In order to reduce noise from data and increase prediction performance to enhance the artificial intelligence (AI) model training and efficiency of the program, Statistical Outlier Curation Kernel Software (or short for SOCKS) was developed.

SOCKS is a software to reduce noise from data seamlessly and often agnostically, and curate the underlying data when necessary to reveal pristine information, but how does it help revolutionize AI and edge computing? By reducing the noise through SOCKS, the AI will be more accurate and faster in analyzing patterns, allowing us to rely on the program more. According to Dr. Shivani and Atul Gupta in their journal “Dealing with Noise Problem in Machine Learning Data-sets: A Systematic Review,” noisy data in data sets can significantly impact prediction of any meaningful information by dramatically decreasing classification accuracy. Hence, once noise is reduced, the technology can continually improve itself through precise trends and cater to the users’ needs better. To put this in a more real-world context, imagine scrolling through Youtube or Twitter, wouldn’t you want your content suggestions to be in your correct areas of interest? For example, if you are into comedy content, you would want Youtube or Twitter to suggest various kinds of comedy content and not sidetrack into horror suggestions because of an outlier or accidental click. SOCKS will help programs reduce the noise data for better efficiency of the AI program.

Image 1: Image Noise Reduction in order to illustrate how efficient program becomes

with the reduction of noise -- not only in images, but in other program accuracy outputs

There has been past work conducted in attempts to reduce noise in data through several noise filtering techniques in order to improve quality of the data in classification tasks. According to Dr. Garcia et al, many current techniques scan the data for noise identification in a preprocessing step. However, some noisy data can still remain unidentified through these techniques and sometimes even safe data is removed (Garcia et al, 2016). The development of SOCKS hopes to improve the current accuracy in removing noise in data from already existing noise filtering techniques.

Apart from having a general user benefit in many daily activities, noise control has benefits for allowing programs to do tasks that are more difficult for human workers to do in a job. For example, there are many data analysts who have to do mundane visual tasks with efficiency and accuracy; however, if a machine could do this through AI, wouldn’t that save a lot of time and effort? Now on top of that, imagine SOCKS enabling the software to work at an even faster rate with more precision. SOCKS enables greater accuracy in repetitive visual tasks, and combined with GSFRS, it can revolutionize the technology industry through instant scaling of visual task completion, rapid training for deploying computer vision, and access to data more quickly.

In conclusion, noise is unfavorable for machine learning training, and if this can be curated before training occurs, a lot of time can be saved!

Check out the next blog focused around how SOCKS and GSFRS can work together to make AI software lightning fast and bullseye accurate.

GSFRS(Giant Signal File Random Sampler)

Search This Blog

Statistical Outlier Curation Kernel Software (SOCKS)

SOCKS: Statistical Outlier Curation Kernel Software
A Noise Reduction Software for Machine Learning

Written by Gitika Gorthi, Chantilly High School

Comments

Post a Comment

Popular posts from this blog

Catalyzing A Data Science Revolution: SOCKS + GSFRS

Speeding up Model Training with Multithreading and GSFRS

GSFRS : The Story of a Gigantic Random Sampler by Dr. Prasanta Pal, Brown University

GSFRS(Giant Signal File Random Sampler)

Statistical Outlier Curation Kernel Software (SOCKS)

SOCKS: Statistical Outlier Curation Kernel SoftwareA Noise Reduction Software for Machine Learning

Written by Gitika Gorthi, Chantilly High School

Comments

Post a Comment

Popular posts from this blog

Catalyzing A Data Science Revolution: SOCKS + GSFRS

Speeding up Model Training with Multithreading and GSFRS

GSFRS : The Story of a Gigantic Random Sampler by Dr. Prasanta Pal, Brown University

SOCKS: Statistical Outlier Curation Kernel Software
A Noise Reduction Software for Machine Learning