Top 25 Pandas tricks

Here’s a really great tour through some advanced Pandas features, by Kevin Markham of Data School.

Here are the tricks that he features:

  1. Show installed versions
  2. Create an example DataFrame
  3. Rename columns
  4. Reverse row order
  5. Reverse column order
  6. Select columns by data type
  7. Convert strings to numbers
  8. Reduce DataFrame size
  9. Build a DataFrame from multiple files (row-wise)
  10. Build a DataFrame from multiple files (column-wise)
  11. Create a DataFrame from the clipboard
  12. Split a DataFrame into two random subsets
  13. Filter a DataFrame by multiple categories
  14. Filter a DataFrame by largest categories
  15. Handle missing values
  16. Split a string into multiple columns
  17. Expand a Series of lists into a DataFrame
  18. Aggregate by multiple functions
  19. Combine the output of an aggregation with a DataFrame
  20. Select a slice of rows and columns
  21. Reshape a MultiIndexed Series
  22. Create a pivot table
  23. Convert continuous data into categorical data
  24. Change display options
  25. Style a DataFrame
  26. Bonus: Profile a DataFrame

My favorite tip is #25, on styling a dataframe. The bonus tip on Pandas profiling is also pretty cool!

A Jupyter notebook with example usage is available on GitHub.

If you’re hungry for more best practices in Pandas, you can check out Kevin’s PyCon 2019 workshop presentation or his complete series of videos on YouTube.

Great explanation of MultiIndex in Pandas

Pandas is a widely popular component of the scientific python stack, and it is truly an indispensable part of the data scientist’s toolkit. The name pandas is actually a portmanteau created from panel and data. Of course, most of us are familiar with dataframes. But what’s a panel?

Panel data contain 3-dimensional data. A very common example is a time-series: Imagine a dataset with the stock (e.g., AAPL, MSFT, etc.) as the index defining the x axis and the price as the variable defining the y axis. A regular 2-dimensional dataframe works fine if you are only taking a cross-sectional snapshot of stock prices at one point in time. But the moment you want to look at patterns in price over the last few months, then time becomes a new index defining the z axis. A panel is a specific data structure designed to accommodate this.

Recently, the Pandas team announced the deprecation of the panel data structure (as of version 0.20.0). Rather, they are encouraging the use of dataframes with hierarchical indexing (MultiIndex). Using a MultiIndex, one may easily process 3-dimensional data in a dataframe — and indeed, any number of dimensions becomes possible.

MultiIndex is intuitive once you learn how to use it, but it can be tricky to wrap your head around it at first. Kevin Markham of the Data School released a great tutorial explaining how to use the MultiIndex in Pandas.

Read more about hierarchical indexing in the official Pandas documentation.

The QuantEcon tutorial site provides a “real-world” example that demonstrates the use of MultiIndex for analysis of 3-dimensional data.

Setting up PyCharm

I love Sublime Text, and I recently wrote how I optimized it for Python development. But I’ve also admired PyCharm as a full-featured IDE. The problem is that PyCharm is visually cluttered, with buttons, toolbars, and windows everywhere. Certainly, there is a very steep learning curve.

Recently, while I was watching one of Michael Kennedy’s video courses (where the coding examples are done in PyCharm), I was inspired to give PyCharm a closer look.

I was happy to discover that there is a video playlist on YouTube that provides an in-depth Getting Started guide. The JetBrains web site also features a Quick Start guide with really excellent documentation/tutorials.

For scientists especially, be sure to check out IPython/Jupyter Notebook integration in PyCharm.

I plan to spend a lot more time going through this material.

Review: “Write Pythonic code like a seasoned developer,” by Michael Kennedy

As many of you know, one of the best podcasts related to Python is Talk Python to Me, by Michael Kennedy. If you haven’t listened to the podcast, you definitely should give it a try. I’m pretty certain you’ll find it terrific and want to subscribe.

Well, in addition to hosting the podcast, Michael Kennedy also runs a Python training program. I recently purchased one of his offerings, “Write pythonic code like a seasoned developer.” The course is aimed at intermediate-level Pythonistas — i.e., familiarity with the basic language features is expected, as this course focuses on teaching the most “Pythonic” way of doing things. The term Pythonic implies writing code and performing tasks in ways that are congruent with Python’s guiding principles. Usually, this leads to maximum efficiency with minimum effort, while also improving safety and readability.

The course covers the following broad categories:

  • Foundational concepts and style guidance from PEP 8
  • Dictionaries
  • Generators and collections
  • Methods and functions
  • Modules and packages
  • Classes and objects
  • Loops
  • Tuples
  • Python for humans

The course consists of 63 videos totaling around 4.5 hours of consecutive viewing. If you pause to test out some of the things you’re learning, it’s probably closer to 12 or 24 hours of lecture/practice. The videos are very well produced, with plenty of code examples and excellent narration. Accompanying these videos is a source-code repository available on GitHub. All this for $39.

Overall, I found the course very worthwhile.

Sometimes, video is the gentlest and/or most expedient entryway to a new topic. For some time now, I have owned Luciano Ramalho’s impressive Fluent Python: Clear, Concise, and Effective Programming. I must have picked it up (and put it down) four times already — every time, I thought, “I’m not ready for this,” and would postpone the investment in developing my Python skills. Well, I think what I needed was this video course by Michael Kennedy. It was the perfect introduction to the advanced concepts in that book.

After having thoroughly enjoyed Michael Kennedy’s course, I think I may be ready to pick up Fluent Python again, for real this time.

Review: “Sublime Python” video course by Dan Bader

Many people learning Python will recognize the name Dan Bader. He’s not only an experienced Python developer, but also a Python enthusiast who is dedicated to helping us improve our Python skills. He created PythonistaCafe, an online forum similar to Stack Overflow but arguably more friendly and inviting for novices. He also has a YouTube channel with lots of educational videos.

Recently, he published a video course called, “Sublime Python: The Complete Guide to Sublime Text for Python Developers.” This course is really great.

While most of my exploratory data-science work is done using Jupyter Notebooks, there is always a need for a text editor and/or IDE for development of longer “operational” code. In the past, I had tried PyCharm — it’s powerful, but I find the visual layout to be cluttered and confusing; there’s definitely a learning curve. On the other end of the spectrum, I used BBEdit for all my text-editing needs. It has some great features, but I struggled when it came to optimizing BBEdit for Python development. Several colleagues told me to check out Sublime Text 3 in the past, but I got very confused by all the packages and themes, and even the way you have to edit text files to change some preferences.

Dan Bader’s new course really simplifies this process. In this ~6-hour course, he takes you from ground zero (a fresh install of Python and Sublime Text on macOS, Windows, and Linux) to a fully optimized setup with syntax highlighting, code linting, git integration, and streamlined code building/execution. He even shows how to optimize certain tasks from the command line in Terminal. The video course did not cover setting it up your Build environment for a specific conda environment, but he helped me do so via email, and he has added this as a possible future update for the course.

This course really saves a lot of time — To discover all these tweaks on my own would have required several days of frustrating trial and error. Now, I have a really slick text-editing (quasi-IDE) environment for my common Python-development needs.

I highly recommend this course to folks who are struggling with finding a better solution for Python development.