Skip to main content

“Data cooking”: The Process For Making Artichoke Dips

 “Data cooking”: The Process For Making Artichoke Dips

Gitika Gorthi, Chantilly High School



Food and data are quite similar. How come? Let me explain. Both are consumed for the purpose of nourishment. In case of food, it is our body and in case of data, it is the model that we are trying to build.


Whether it be by using a blender or our teeth, we process all of our food before consumption to avoid choking. To be more specific, imagine eating an artichoke. Will you just consume it at once or will you process it with your teeth before consumption? Most of us will do the latter to avoid choking by the extremely nutritious vegetable. The same conceptual idea is applied on data. Data is powerful and imperative for technological advancements; however, if raw data is fed into machine learning (ML), Fast Fourier Transform (FFT), or other similar systems without the cooking or cleaning parts, it may lead to dangerous outcomes instead.



Figure 1: Labeled illustration of the good and choke parts of the artichoke.



Daily on the news, we read about remarkable technological advancements that once seemed like science fiction -- such as self-driving cars. Nonetheless, companies such as Tesla have made many Hollywood writers’ stories a reality. According to Andrej Karpathy (Tesla’s head of artificial intelligence and computer vision), in February 2020 Tesla cars have driven 3 billion miles on Autopilot. Tesla software developers were able to make self-driving a reality by working with various processing tools. However, what if all the data they work with is not “cooked” properly? You guessed it, the end product would not be “tasty”.


A recent example of this in the May article “Tesla in deadly California crash was on Autopilot” where a 35-year-old man was killed when his Tesla Model 3 struck an overturned semi-truck on a freeway at about 2:30am when it was on autopilot (self-driving mode). Situations like that illustrate the necessity to fully process data, as it could be saving a few hundred lives annually out there. Now that we have a deeper understanding on the importance of data processing, let’s discuss specific tools.



Figure 2: The image above represents what the Tesla Autopilot sees. If anything is miscalculated or misprocessed, it could be fatal to drivers.



There are many data processing tools, but let’s talk specifically about the image processing tool the FFT as it has a wide range of applications. FFT is a processing tool used to “decompose an image into its sine and cosine components”, where it can be used for “image analysis, image filtering, image reconstruction, and image compression” (A. Marion, 1991).



Figure 3: The above image is a “Butterfly Diagram” & it represents the FFT algorithm as a visual diagram.



The FFT can be used in the healthcare industry by allowing for more precise diagnosis with the aid of medical devices. For those who may be unaware, medical image processing is rapidly growing in the healthcare world. From the non-invasive exploration of 3D image datasets of the human body using Computer Tomography (CT) or Magnetic Resonance Imaging (MRI) Scanners for surgical planning to diagnosis of pathologies to advance research, data image processing is proving to be critical in a hospital’s success (Synopsys, 2021).



                         

Figure 4: Above is an example illustration of the segmentation of aortic dissection; for accurate medical advancements and treatment procedures, it is imperative for accurate image processing.


Additionally, the FFT can be used in various smart tools, such as thermostats -- devices that are used to sense and regulate the temperature of the air, liquids, or other processes. Many of the newly developed thermostats incorporate advanced machine learning algorithms to adapt to user preference and schedule. To provide a more specific example, new smart WiFi thermostats using a regression tree approach (Random Forest) to develop models to predict the room temperature as measured by each thermostat and the cooling status (Huang et al, 2018).


Like self-driving cars, medical devices, and thermostats, there are many tools in this digital era that run on accurate data processing. The next time you are chewing on some artichoke dips, remember the importance of processing data as efficiently as them.


Comments

Popular posts from this blog

Speeding up Model Training with Multithreading and GSFRS

S peeding up Model Training with Multithreading and GSFRS                   written by Rahat Ahmed Talukder , Notre Dame University Bangladesh                  We live in a multicore universe where great things can happen in parallel. Parallel processing is equivalent to enormous performance gain. Organized parallelism is how our own body works through dynamic bit organized activation of billions of single neurons. Everybody wants to parallelize a workload done on a data frame. In the machine learning (ML) lifecycle, different workloads are parallelized across a large VM. This allows you to take advantage of the efficiency of the VM and maximize the use of your notebook session. Nonetheless, many of the machine learning or scientific libraries used by data scientists ( Numpy, Pandas, sci-kit-learn,...) release the GIL, allowing their use on multiple threads. It is important to keep in mind that when our dataset is large, threads are more practical than processes because of the possible

Catalyzing A Data Science Revolution: SOCKS + GSFRS

  Catalyzing A Data Science Revolution: SOCKS + GSFRS Written by Gitika Gorthi, Chantilly High School Why the technologies Giant Signal File Random Sampler (GSFRS) and Statistical Outlier Curation Kernel Software (SOCKS) ? How will they benefit you in achieving your data science goals? “Data is the new fuel of the digital economy” or can be viewed as the new gold; harnessing and accurately decoding the meaning of the numbers is crucial to increase two types of efficiency for organizations: speed and accuracy. GSFRS addresses speed and SOCKS addresses accuracy, coupled together, make the power team. Suppose you are a data analyst and you are assigned to take a bunch of numbers and make sense of them -- I know your reaction, you are most probably scared. But don’t worry, we have artificial intelligence to the rescue in order to develop algorithms to do the hardwork for us (yay!). Now the question is, is the program really telling us the right information? Some may trust blindly whatever

GSFRS : The Story of a Gigantic Random Sampler by Dr. Prasanta Pal, Brown University

                                                            What is GSFRS ? All about data and more data! We want stuff! A lot of stuff! Often, more stuff than we can handle. These days, with everything turning digital, it means, we are looking for a lot of data which in disguise means trouble! Let's understand the kind of of troubles we may be asking for through some stories. Suppose you've been collecting radio signals from the alien world and storing it in a gigantic file (call it the big book of universal secrets) for last million years. By now, the size of the book got so big that you started counting it in exabytes! (10^18 bytes). If someone tells you that the secret equation for time-travel is buried somewhere around the 3 trillionth line and you want the secret code right now because a giant asteroid is about to hit the earth, there is no other way around but to time travel! How do you retrieve the time-travel code from "The big book of universal secrets" at