Latest news about Bitcoin and all cryptocurrencies. Your daily crypto news habit.
During my Initial planning on My Self-Taught Machine Learning journey this year, I had pledged to make into Top 25% for any 2 (Live) Kaggle competitions.
This is a write up of how Team ârm-rf /â made it to the Top 30% in our First kaggle competition ever: The âQuick, Draw! Doodle Recognition Challengeâ by Google AI Team, hosted on kaggle.
Special Mention: Team ârm-rf /â was a two-member team consisting of my Business partner and friend Rishi Bhalodia and myself.
Experience
Picture this: A race that goes on for three months. There is no finish line, there are high-scores. Youâre a seasoned runner in your school.
The catch?
Youâre against people speeding on SuperCars (GrandMasters with a LOT of experience) while I was running barefoot. Sure, I was pretty good against my friends, but against a car?
In a sense, it was like a PUBG game where we just free-fall into the middle of the fight and grabbed onto any gun we could, tried to make it to the âsafe zoneâ.
The Safe Zone for us was the medal area.
So why is everyone so addicted to Kaggle?
Personally, I learned much more in the 1 month of competing than any 1 month MOOC that I have ever taken up.
The âExpertsâ on kaggle are always generous with their ideas. We were surprised by the fact that many people would almost give out their complete solutions-being on the Top LB positions!
Every night weâd make a good submission, the next day you wake up and you would have fallen by 20 ranks on the LB! Hit Reset and work again towards a better submission and then repeat again every day.
Finally, We got to 385/1316 on Private LB: Our ideas on the given compute allowed to get to just this rank. It was definitely an amazing experience, Iâd definitely compete more and try to perform better.
On many occasions, when weâd manage to land a Top 100 submission, after sleeping soundly-weâd wake up to a public kernel that would have completely thrown us down the LB by 50 positions!!
People who compete in competitions are truly passionate, there is always an overflow of ideas, always an overflow of talent. For us, it was just giving it our best shot and living for a few weeks on minimal sleep and breaking and building conda environments while trying to convert the ideas shared in the discussions into code.
In the end, I really learned why Kaggle is the Home to Data Science.
Goal
It was always a dream goal of mine to perform well in kaggle competitions. For starters, I had kept the bar at 2 Top 25% submissions.
Doodle Challenge:
Even though the competition had a large number of file size and a few interesting challenges, we were aiming our best for a medal.
Competition
Goal: The goal is to identify doodle(s) from a dataset of CSV(s) files containing the information to âdrawâ or create them.
Challenge is: The number of files in the challenge was ~50Â Million!
Personally, I had already done a LARGE number of MOOC(s) and I was confident given that how many times I had gone over the definition of a CNN, itâd be pretty easy to land a bronze medal. Of course, I was very wrong.
Things that did not work
Weâre very grateful to Kaggle Master Radek Osmulski for sharing his Fast.ai starter pack-we built on Top of it along with a few tricks from Grandmaster Beluga.
Mistake 1: As beginners, its always suggestive to start a kaggle competition when the competition is fresh.
Why?
The number of ideas that overflow in a competition is just huge! Even the LB Top scorers are very generous with sharing their approach-itâs just a matter of if we could figure out the missing nuggets of information and be able to convert their ideas into code.
We started the competition almost past mid-way since its launch.
Mistake 2: We learned about âexperimentingâ, validation.
The starter pack works on 1% of the data, I tried experimenting with 5% of the data, followed by 10% of the data and this had shown a consistent increase in performance.
Next: I decided to throw all the data at the model, the training ran for ~50 hours on my 8 GB GPU. We had expected a Top score with this approach and instead the model accuracy fell to the floor!
Lesson learned: Experiments are very important and so is validation. The starter pack relies on âdrawingâ the images into memory and then training the model on them. The issue with Linux is, it keeps a limit at the number of files unless specifically formatted against that.
We didnât do a validation check and trained against a portion of the data. A reflective validation set is certainly something that you must install while getting started.
Ideas that worked
Progressive Training and re-sizing:
- Training the Model on 1% of the data with 256 image size.
- Fine-Tuning the Model to 5% of the data with 128 image size.
- Further, Fine-Tuning the Model to 20% of the data with 64 image size.
- ResNet18<ResNet34<ResNet50<ResNet 152 when trained on the same portion of the dataset.
This approach showed a consistent increase in performance.
Ensembling: Our best submission was a âBlendâ of a ResNet 152 trained with fastai (v1) on 20% of the dataset and the MobileNet Kernel by Kaggle GMÂ Beluga.
Summary
You will find many better discussions in the competition discussions, so please excuse this post if these ideas donât get you a Top LBÂ score.
This was really a personal summary of the things that Iâve learned competing.
Finally, Iâve pledged to compete a lot more, hopefully, youâll find team ârm-rf /â somewhere on the LB in the recent competitions.
Iâve been bitten by the âKaggle Bugâ and I would probably prefer competing in the future over signing up for more MOOC(s)
See you on the LB, Happy Kaggling!
If you found these interviews interesting and would like to be a part of My Learning Path, you can find me on twitter here.
If youâre interested in reading about Deep Learning and Computer Vision news, you can check out my newsletter here.
First Kaggle Competition Experience was originally published in Hacker Noon on Medium, where people are continuing the conversation by highlighting and responding to this story.
Disclaimer
The views and opinions expressed in this article are solely those of the authors and do not reflect the views of Bitcoin Insider. Every investment and trading move involves risk - this is especially true for cryptocurrencies given their volatility. We strongly advise our readers to conduct their own research when making a decision.