Index de l'article

Work with dates

Group by months

If you have a data-frame df with a field Created at containing dates like:

2021-02-05 10:34:21 UTC

You can work with it, example to group and sort by month, you can cheat doing:

df_Temp = pd.DataFrame(df['Created at'].str[:7])
df_Created_count = pd.DataFrame(df_Temp.groupby(['Created at'], dropna=False).size(), columns=['Total']).sort_values(['Created at'], ascending=True).reset_index()

indeed with str[:7] we extract years and months from strings. Then a usual groupby.

Ok but it is more a pandas-way to convert your field in a true date format, in a new virtual field. Then you can extract real months and store them in a proper data-frame.

df['Created'] = pd.to_datetime(df['Created at'])
df['Created'] = df['Created'].dt.to_period("M")
df_TempCreated = pd.DataFrame(df['Created'])

Then easy to group and count them.

df_Created_count = pd.DataFrame(df_TempCreated.groupby(['Created'], dropna=False).size(), columns=['Total']).sort_values(['Created'], ascending=True).reset_index()

As now you use a real date format, you can display the full months with dt.strftime.

df_Created_count['Created'] = df_Created_count['Created'].dt.strftime('%B %Y')

Format date

If you get date like 2022-10-24 and you would 24/10/2022:

df['My date'] = pd.to_datetime(df['My date'], format='%Y-%m-%d')
df['My date'] = df['My date'].dt.strftime("%d/%m/%Y")

Get simple date (DDDDMMDD)

df_DataUnion['date_simple'] = df_DataUnion['date_submit'].astype(str).str[:10]
df_DataUnion['date_simple'] = df_DataUnion['date_simple'].replace({'-': ''}, regex=True)

Extract the previous month

PreviousMonth = today - pd.DateOffset(months=1)

Convert false date field from Excel

Excel can concatenate true date format and string in the same column, arrgh! Later when you would use it in Pandas...

I use this tip to manage these kind of malicious fields in pandas:

df.loc[~df_GDC['date'].str.contains(' ', na=False), 'date'] = df['date'].astype(str).str.split(expand=True)[0]
df.loc[~df_GDC['date'].str.contains(' ', na=False), 'date'] = \
    df['date'].astype(str).str.split('-', expand=True)[2] + '/' +\
    df['date'].astype(str).str.split('-', expand=True)[1] + '/' +\
    df['date'].astype(str).str.split('-', expand=True)[0]

Eventually I put the code above in a try:... except:...

Eventually I convert all my field in string before, with astype(str)

Manage numeric date from Excel

Sometimes dates from Excel come as integer, arrgghhh!

import xlrd
def read_date(date):
    return xlrd.xldate.xldate_as_datetime(date, 0)
df.loc[df['My date'].astype(str).str.isdigit() == True, 'My date'] = \
pd.to_datetime(df['My date'].loc[df['My date'].astype(str).str.isdigit()].astype(int).apply(read_date), errors='coerce')

Check date format

Sometimes (especilally when it comes from Excel...) it is good to check your dates after treatments.

Below we search dates not starting with DD/MM/YYYY:

df2 = df[~df['my date'].str.contains('^[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]', regex=True)]
print(tabulate(df2, headers='keys', tablefmt='psql', showindex=False))