Index de l'article


Create a dataframe from a previous dataframe (copy)

df2 = df1.copy()

Copy data from a dataframe in the clipboard

Very convenient to copy-paste quickly in a text file, CSV, Excel ...

df.to_clipboard(sep=',', index=False, header=None)

Get column names in a list

listColumn = df.columns.tolist()

Create a dataframe from a list

import pandas as pd
from tabulate import tabulate
MyList = ['7878', 'Bbbb', 'azerfg', '545', 'XXX']
df = pd.DataFrame(MyList)
df.columns =['My field']
number = df.shape[0]
print(tabulate(df.head(10), headers='keys', tablefmt='psql', showindex=False))


To dedupe a dataframe on an Email field keeping only the first duplicate record, use drop_duplicates.

df['Email'] = df['Email'].str.lower()
dfDeduped = df.drop_duplicates(subset=['Email'], keep='first')

Find duplicates

Use duplicated:

df2 = df1[df1.duplicated(['My field'], keep=False)]
print(tabulate(df2.head(10), headers='keys', tablefmt='psql', showindex=False))

Export a picture from a chart

And add a border.

from PIL import Image, ImageOps
fig1.savefig('C:/Users/myplot.png', dpi=100)
im ='C:/Users/myplot2.png')
bordered = ImageOps.expand(im, border=1, fill=(0, 0, 0))'C:/Users/myplot2.png')

Re-organize columns

df = df[['Field 1', 'Adresse mail', 'Note algorithmique', 'Substitutions']]

Re-name columns

From their original name:

df.rename(columns={'Old name 1': 'New name 1', 'Old name 1': 'New name 1'}, inplace=True)

From their position:

df.columns.values[12] = 'New name'

Delete column

df.drop('My field', axis=1, inplace=True)

Format number with space thousand separator

number = df.shape[0]
print(format(number, ',').replace(',', ' '), 'customers')


f"{number_of_rows:,}".replace(',', ' ')


To rank according to a field, from highest to lowest:

df['My rank'] = df['My field to rank'].rank(ascending=False)

Avoid duplicate:

df['My rank'] = df['My field to rank'].rank(ascending=False, method='first')

Generate a unique random integer

If you want a unique random integer from 1 included to the number of records included.

howManyRecord = df.shape[0]
df['Random number'] = np.random.choice(range(howManyRecord), howManyRecord, replace=False)
df.loc[df['Random number'] == 0, 'Random number'] = howManyRecord

Select with condition

df = df.loc[df['Field to filter'] == 'Yes']

Prevent nan

Use keep_default_na=False :

Data = pd.read_excel(inputExcelFile, sheet_name='Feuil1', engine='openpyxl', keep_default_na=False)

Get tuples from a file (in a SQL UPDATE shape)


Very usefull to build SQL update queries from a file with the primary key in first column :

file1 = open(MyPath+'FixQueries.sql', 'w', encoding='cp1252')
for i in df.itertuples(index=False):
    file1.write('UPDATE MyTable SET ' +\
        str(i)[7:-1].split(', ', 1)[-1] +\
        " WHERE id = '" + i[0] + "' ;\n")

Start the index to a special value

Vey usefull to then use it in Excel formulas (starting with 2 ...).

df.index = df.index + 2

Use index in concatenation


Reset index

df.reset_index(drop=True, inplace=True)

Open a box to enter a string and use it as a variable

import tkinter as tk
from tkinter import simpledialog
window = tk.Tk()
window.eval('tk::PlaceWindow . center')
repertoireCarto = simpledialog.askstring("Saisie", "Répertoire ?\t\t\t\t", parent=window)
print('Le répertoire est', repertoireCarto)

 Get some line indexes according condition

list_duplicate = df.index[df['My field'] == 'Text'].tolist()

Import external scripts

Your working script needs to know where to find the external scripts:

from YourScriptWithoutExtension import SomeVariables, ...


Not related to Pandas but very useful to run shell command, bash, console script...'psql -h localhost -p 5432 -d work -U postgres -c "SELECT NOW() ;"', shell=True)

Strings comparison

Compare the similarity of 2 strings, to get a indicator in a new field:

import difflib
from difflib import SequenceMatcher
df['indicator'] = df[['Field1', 'Field2']].apply(lambda x: SequenceMatcher(lambda y: y == " ", x[0], x[1]).ratio(), axis=1)

Count time execution

import time
print('Time for now is %s secondes' % round(time.process_time(), 1))
timeNow = time.process_time()
print('Time now is %s secondes' % round(time.process_time() - timeNow, 1))
timeNow = time.process_time()

Include external Python code (without real importation)

YourFileCOde = r'C:/Users/Georges/PycharmProjects/WorkEMC/'
exec(compile(open(YourFileCode, 'rb').read(), YourFileCode, 'exec'))

Concatenate fields without empty value

df['adresse'] = df[['Filed1', 'Filed2', 'Filed3']].apply(lambda x: ' '.join(x.dropna()), axis=1)