Index de l'article

Group, count and sort

List unique values

print(df_Deduped.devise.unique())

Count rows

number = df.shape[0]
print(number)

Count values (group by)

If you have a field named Country in a data-frame named df:

CountCountry = df.groupby(['Country'], dropna=False).size().sort_values(ascending=False)
print(CountCountry)

Count values (group by) and get it in a dataframe

CountCommune = pd.DataFrame(df.groupby(['id_commune'], dropna=False).size(), columns=['Total']).sort_values(['Total'], ascending=False).reset_index()
 
print(tabulate(CountCommune.head(10), headers='keys', tablefmt='psql', showindex=False))

Count values grouping by several fields to find duplicates

df_Duplicates = pd.DataFrame(df.groupby(['id_commune', 'PAGES'], dropna=False).size(), columns=['Total'])\
.sort_values(['Total'], ascending=False).reset_index()
df_Duplicates = df_Duplicates[(df_Duplicates['Total'] > 1)]
 
print(tabulate(df_Duplicates.head(10), headers='keys', tablefmt='psql', showindex=False))

Group concat

df = df.groupby(['country'])['city'].apply(lambda x: ', '.join(x)).reset_index()

Count a specific value

print (len(df[df['Your field'] == 'United States']))

Count with condition

myCount = df[df['Field where search the condition'] == 'A value'].count()['Field to count']

Count empty values

number = df['Your field'].isna().sum()

Keep empty values in a group by and fill them

newDf = pd.DataFrame(df.groupby(['Your field'], dropna=False).size(), columns=['Total']).sort_values(['Total'], ascending=False).reset_index()
newDf = newDf.fillna('Unknow')

Count email domains (group by)

You have a Email field with clean emails. Thanks to Pandas it is easy to group and count their DNS.

First create a virtual field to split the emails and recover the 2nd part of the split. Then just group your new field.

df['Domain'] = df['Email'].str.split(<a href="mailto:'@').str">'@').str</a>[1]
df_Email_DNS_count = pd.DataFrame(df.groupby(['Domain']).size(), columns=['Total'])\
    .sort_values(['Total'], ascending=False).reset_index()

Calculate percentages

Acording your needs, you can use the sum from a Total field.

df['Percent'] = (df['Total'] / df['Total'].sum()) * 100
df['Percent'] = df['Percent'].round(decimals=2)

Or the number of rows in your data-frame (useful if you work with a multi-valued field, see above the Analyze data from a multi-valued field chapter) .

df['Percent'] = (df['Total'] / df.shape[0]) * 100
df['Percent'] = df['Percent'].round(decimals=2)

Sum group by year

df['Year'] = df['DateInvoice'].dt.to_period('Y')
 
df_AmountYear = df.groupby(['Year'], dropna=False)['AmountEuro'].sum().reset_index()
df_AmountYear = df_AmountYear.sort_values(['Year'], ascending=False)
 
print('\nAmount per year:')
print(tabulate(df_AmountYear, headers='keys', tablefmt='psql', showindex=False))

Sort on a field 

df.sort_values(by=['Field'], ascending=False, inplace=True)
# Recreate the index if necessary (if you will use the index later)
df = df.reset_index(drop=True)

Sort on a field containing both text and numerical values

df = df.loc[pd.to_numeric(df['My field'], errors='coerce').sort_values().index]

Sum and count

If you want sum an Amount field per country counting the number of lines:

df_AmountCountries = df.groupby(['Country'], dropna=False)\
    .agg({'ID': 'count', 'AmountEuro': 'sum'})\
    .reset_index()\
    .rename(columns={'ID': 'Number'})

 

Liens ou pièces jointes
Télécharger ce fichier (France-Departements-Deformation.zip)France-Departements-Deformation.zip[France-Departements-Deformation]335 Ko
Télécharger ce fichier (simple_countries.zip)simple_countries.zip[simple_countries]1880 Ko