Page 17 sur 18
Miscellaneous
Create a dataframe from a previous dataframe (copy)
df2 = df1.copy()
Copy data from a dataframe in the clipboard
Very convenient to copy-paste quickly in a text file, CSV, Excel ...
df.to_clipboard(sep=',', index=False, header=None)
Create a dataframe from a list
import pandas as pd from tabulate import tabulate MyList = ['7878', 'Bbbb', 'azerfg', '545', 'XXX'] df = pd.DataFrame(MyList) df.columns =['My field'] number = df.shape[0] print(tabulate(df.head(10), headers='keys', tablefmt='psql', showindex=False)) print(number)
Dedupe
To dedupe a dataframe on an Email field keeping only the first duplicate record, use drop_duplicates
.
df['Email'] = df['Email'].str.lower() dfDeduped = df.drop_duplicates(subset=['Email'], keep='first')
Find duplicates
Use duplicated
:
df2 = df1[df1.duplicated(['My field'], keep=False)] print(tabulate(df2.head(10), headers='keys', tablefmt='psql', showindex=False))
Export a picture from a chart
And add a border.
from PIL import Image, ImageOps fig1.savefig('C:/Users/myplot.png', dpi=100) im = Image.open('C:/Users/myplot2.png') bordered = ImageOps.expand(im, border=1, fill=(0, 0, 0)) bordered.save('C:/Users/myplot2.png')
Re-organize columns
df = df[['Field 1', 'Adresse mail', 'Note algorithmique', 'Substitutions']]
Re-name columns
From their original name:
df.rename(columns={'Old name 1': 'New name 1', 'Old name 1': 'New name 1'}, inplace=True)
From their position:
df.columns.values[12] = 'New name'
Delete column
df.drop('My field', axis=1, inplace=True)
Format number with space thousand separator
number = df.shape[0] print(format(number, ',').replace(',', ' '), 'customers')
Or:
f"{number_of_rows:,}".replace(',', ' ')
Ranking
To rank according to a field, from highest to lowest:
df['My rank'] = df['My field to rank'].rank(ascending=False)
Avoid duplicate:
df['My rank'] = df['My field to rank'].rank(ascending=False, method='first')
Generate a unique random integer
If you want a unique random integer from 1 included to the number of records included.
howManyRecord = df.shape[0] df['Random number'] = np.random.choice(range(howManyRecord), howManyRecord, replace=False) df.loc[df['Random number'] == 0, 'Random number'] = howManyRecord
Select with condition
df = df.loc[df['Field to filter'] == 'Yes']
Prevent nan
Use keep_default_na=False
:
Data = pd.read_excel(inputExcelFile, sheet_name='Feuil1', engine='openpyxl', keep_default_na=False)
Get tuples from a file (in a SQL UPDATE
shape)
...df.itertuples()...
Very usefull to build SQL update queries from a file with the primary key in first column :
file1 = open(MyPath+'FixQueries.sql', 'w', encoding='cp1252') # ÉCRITURE DU FICHIER for i in df.itertuples(index=False): file1.write('UPDATE MyTable SET ' +\ str(i)[7:-1].split(', ', 1)[-1] +\ " WHERE id = '" + i[0] + "' ;\n")
Start the index to a special value
Vey usefull to then use it in Excel formulas (starting with 2 ...).
df.index = df.index + 2
Use index in concatenation
df.index.astype(str)
Reset index
df.reset_index(drop=True, inplace=True)
Open a box to enter a string and use it as a variable
import tkinter as tk from tkinter import simpledialog window = tk.Tk() window.eval('tk::PlaceWindow . center') window.withdraw() repertoireCarto = simpledialog.askstring("Saisie", "Répertoire ?\t\t\t\t", parent=window) print('Le répertoire est', repertoireCarto)
Get some line indexes according condition
list_duplicate = df.index[df['My field'] == 'Text'].tolist()
Import external scripts
Your working script needs to know where to find the external scripts:
sys.path.append('C:\\Users\\Georges\\PycharmProjects\\Your_Directory') from YourScriptWithoutExtension import SomeVariables, ...
Subprocess
Not related to Pandas but very useful!
subprocess.run('psql -h localhost -p 5432 -d work -U postgres -c "SELECT NOW() ;"', shell=True)