Time series data manipulation
Contents
2.4. Time series data manipulation#
2.4.1. Difference between two time series dates#
How do you find the difference between the dates of two time series?
As long as they have the same format, you can use the difference method of Pandas DateTimeIndex objects.
Below, we create two time series: one for a full year and one for only business days. The rest is fairly easy👇
2.4.2. Set errors to NaT while converting to datetime in Pandas#
When you load a datetime column into Pandas, the default datatype will be “object”. To convert it to datetime, you can use pd.to_datetime() function.
However, if the datetime index is corrupted you will get rad, fat errors. You can set those faulty dates to NaT (Not a Time, i.e. datetime NaN) by setting errors to “coerce”.
2.4.3. Strip unnecessary components of date time objects#
Sometimes, datetime objects come in unnecessary granularity. They may have nanoseconds or seconds information when you are just interested in the year/month/day. You can use Pandas to_period function with a frequency name to strip away clutter.
2.4.4. Set datetime index for plotting#
Having a Date Time Index in your dataframe makes it stupidly easy to visualize time series. You don’t even have to import matplotlib, just extract the column(s) you want from the dataframe and call plot() on them. Pandas takes care of the rest.
2.4.5. Generate business-day frequency time series with certain workweeks#
How to generate a business-day frequency time series with only certain workweeks?
Use the bdate_range function of Pandas and its weekmask parameter. Below, we create a time series from 2020 to 2022, with only Mondays, Wednesdays and Fridays as working weeks.
Such time series can be useful when analyzing data from financial sources.
2.4.6. Time series offset aliases#
There are no less than 27 time series offset aliases. What are they?
Many Pandas functions like date_range have a parameter called freq (frequency) - it denotes how often each data point should occur in a time series.
Possible values are daily, hourly, weekly, all work days, month start and end, quarterly, yearly, etc. Check out the link below to learn more about them.
List of offset aliases: https://bit.ly/3Roy7Rb
Image credit: Pexels
2.4.7. Filtering by partial date components#
If you have a DateTimeIndex in your Pandas dataframes, you can filter it by partial date components.
For example, from 1995 to 1997, from 5th month of 1995 to the end of 2000, from the beginning of 2015 to 17th of July of 2018, etc.
And these all work regardless of the time series index granularity. All courtesy of Pandas.
2.4.8. Full list of datetime format strings#
Do you know all the TWENTY FOUR datetime format codes? Of course not and neither do I! But I know where to look.
Format codes are those little strings you pass into Python datetime functions like %H, %m, %D, %c, %j, etc. They can denote everything from nanoseconds to whole years and with different name representations in string dates.
Here is the full list from the Python docs: https://bit.ly/3uzMuYT
Image credit: Pexels
2.4.9. Time series index with holidays#
How do you leave out holidays in a time series and still keep its frequency intact?
Pandas bdate_range function has a “holidays” parameter that accepts a list of datetime objects as holiday dates. The result is a time series with daily frequency with weekends and provided holidays ignored👇