Latest news about Bitcoin and all cryptocurrencies. Your daily crypto news habit.
This morning, I came across this amazing library that scales up the existing pandas code by changing just one line of code and making it at least 2x faster compared to the existing. Seeing such big claims gave me a reason to test it out and see the results myself.
This is the project i came across check it out!
I will be importing a 2 different datasets of different sizes to compare the performances for both the methods.
Dataset 1:
Size = 445MB
#!/usr/bin/env python
import timeimport pandas as pd
duration = []for i in range(3): start = time.time() data_df = pd.read_csv( 'data.txt', sep = '\s\|\|\s' ) stop = time.time()-start duration.append(stop) del data_df
final_time_pd = sum(duration) / float(len(duration))
print 'Average time taken to load dataset for pandas over 3 times is approx {} seconds'.format(final_time_pd)
>>> Average time taken to load dataset for pandas over 3 times is approx 12.120 seconds
import timeimport modin.pandas as pd
duration = []for i in range(3): start = time.time() data_df = pd.read_csv( 'data.txt', sep = '\s\|\|\s' ) stop = time.time()-start duration.append(stop) del data_df
final_time_pd = sum(duration) / float(len(duration))
print 'Average time taken to load dataset for modin pandas over 3 times is approx {} seconds'.format(final_time_pd)
>>> Average time taken to load dataset for modin pandas over 3 times is approx 6.515 seconds
Dataset 2:
Size = 990MB
I used the same above code and re-ran with different dataset.
>>> Average time taken to load dataset for pandas over 3 times is approx 111.723 seconds
>>> Average time taken to load dataset for modin pandas over 3 times is approx 71.770 seconds
P.S. Unfortuately, modin does not support read_table method as of now, that’s why I had to use read_csv.
Results are really impressive! 😍 This is surely going to help me in handling good amount of data now in pandas while making use of pandas magics with speed.
Modin uses Ray to provide an effortless way to speed up your pandas notebooks, scripts, and libraries also at the same time gives seamless integration and compatibility with existing pandas code. It uses all 4 physical cores, whereas in pandas, you are only able to use 1 core at a time when you are doing computation of any kind.
This has contributed some real good s**t to Data Science / ML Enthusiasts. Kudos! Do give it a try at-least once for your use case.
Feel free to comment and share your thoughts. Do share and clap if you ❤ it.
Faster pandas, even on your laptop was originally published in Hacker Noon on Medium, where people are continuing the conversation by highlighting and responding to this story.
Disclaimer
The views and opinions expressed in this article are solely those of the authors and do not reflect the views of Bitcoin Insider. Every investment and trading move involves risk - this is especially true for cryptocurrencies given their volatility. We strongly advise our readers to conduct their own research when making a decision.