GSFRS(Giant Signal File Random Sampler)

Posts

Speeding up Model Training with Multithreading and GSFRS

S peeding up Model Training with Multithreading and GSFRS written by Rahat Ahmed Talukder , Notre Dame University Bangladesh We live in a multicore universe where great things can happen in parallel. Parallel processing is equivalent to enormous performance gain. Organized parallelism is how our own body works through dynamic bit organized activation of billions of single neurons. Everybody wants to parallelize a workload done on a data frame. In the machine learning (ML) lifecycle, different workloads are parallelized across a large VM. This allows you to take advantage of the efficiency of the VM and maximize the use of your notebook session. Nonetheless, many of the machine learning or scientific libraries used by data scientists ( Numpy, Pandas, sci-kit-learn,...) release the GIL, allowing their use on multiple threads. It is important to keep in mind that whe...

Catalyzing A Data Science Revolution: SOCKS + GSFRS

Catalyzing A Data Science Revolution: SOCKS + GSFRS Written by Gitika Gorthi, Chantilly High School Why the technologies Giant Signal File Random Sampler (GSFRS) and Statistical Outlier Curation Kernel Software (SOCKS) ? How will they benefit you in achieving your data science goals? “Data is the new fuel of the digital economy” or can be viewed as the new gold; harnessing and accurately decoding the meaning of the numbers is crucial to increase two types of efficiency for organizations: speed and accuracy. GSFRS addresses speed and SOCKS addresses accuracy, coupled together, make the power team. Suppose you are a data analyst and you are assigned to take a bunch of numbers and make sense of them -- I know your reaction, you are most probably scared. But don’t worry, we have artificial intelligence to the rescue in order to develop algorithms to do the hardwork for us (yay!). Now the question is, is the program really telling us the right information? Some may trust blindly what...

Statistical Outlier Curation Kernel Software (SOCKS)

SOCKS: Statistical Outlier Curation Kernel Software A Noise Reduction Software for Machine Learning Written by Gitika Gorthi, Chantilly High School Have you ever had your classmates' voices overpower your teacher’s, making it hard for you to listen to the important instructions being given? Or have you ever had a tough time understanding news on the radio because of the heavy static sounds? Noise in data sets is similar to the noisy disturbances we hear in our daily lives; it is additional information that serves no apparent purpose such as in the form of data corruption. Noise in data often causes the algorithms to miss out patterns or specific trends in the data, similarly to how we can miss important instructions or news due to the background disturbances. The study “Dealing With Noise In Defect Prediction” has determined that false positive and false negative noises alone can lead to a 20-35% decrease in prediction performance (Kim et al, 2011). In order to reduce noise fro...

Revolutionizing Database Technology with GSFRS

Revolutionizing Database Technology: GSFRS Written by Gitika Gorthi, Chantilly High School Has there ever been a day where you were free from the usage of all electronic devices? Won’t the day feel dry without some form of digital juice? A lot of the data dealt with these days are all classified as big data, collections of large and diverse data that is more sizable than traditional databases. Have you ever had a moment when a large software took a long time to load or open a particular file, and the computer seemed stuck on repeat at that one page? If so, you are not the only one. Whether it be students, researchers, or even technologies such as rovers, fast data processing, and analytics is always a desirable outcome. What if we told you that there is a novel, portable, and highly efficient rapid data access data tool that can allow near-real-time access to any part of a large-sized data file? You might be thinking, is that possible? The answer is yes, it is possible. A tool ...

GSFRS : The Story of a Gigantic Random Sampler by Dr. Prasanta Pal, Brown University

What is GSFRS ? All about data and more data! We want stuff! A lot of stuff! Often, more stuff than we can handle. These days, with everything turning digital, it means, we are looking for a lot of data which in disguise means trouble! Let's understand the kind of of troubles we may be asking for through some stories. Suppose you've been collecting radio signals from the alien world and storing it in a gigantic file (call it the big book of universal secrets) for last million years. By now, the size of the book got so big that you started counting it in exabytes! (10^18 bytes). If someone tells you that the secret equation for time-travel is buried somewhere around the 3 trillionth line and you want the secret code right now because a giant asteroid ...