Index de l'article

Filters

Get lines with a field containing a string

df2 = df1[df1['My field'].str.contains('AAA').fillna(False)]
 
print(tabulate(df2.head(5), headers='keys', tablefmt='psql'))

Get lines with a field not containing a string

Just use ~:

df2 = df1[~df1['My field'].str.contains('AAA').fillna(False)]
 
print(tabulate(df2.head(5), headers='keys', tablefmt='psql'))

Get lines with a field not empty, not NaN, not null

df_notEmpty= df[
    (df['My field'].notnull()) &
    (df['My field'].notna()) &
    (df['My field'] != '')
]

Get lines with a field matching a regex

Here we search where the field is a number with a comma as decimal separator.

dfTest = df_ImportTxt[df_ImportTxt['My field'].str.match(pat='^\d+,\d+$')==True]

Get lines searching several strings in a field

Example searching space, star and parenthesis:

MySearch = ['\*', ' ', '\(', '\)']
 
df2 = df1[df1['zone'].str.contains('|'.join(MySearch)).fillna(False)]
 
print(tabulate(df2.head(40), headers='keys', tablefmt='psql'))

Get lines with a field in digit

df2 = df1.loc[df1['zone'].astype(str).str.isdigit()]
 
print(tabulate(df2.head(5), headers='keys', tablefmt='psql'))

Get lines according several condition

Below we get lines with a field with null values and another field with empty values:

df2 = df1[(df1['My field'].isna()) | (df1['My other field'] == '')]
 
print(tabulate(df2.head(5), headers='keys', tablefmt='psql', showindex=False))

Get lines searching if values are IN a list

searchIn= ['AAA', 'BBB', 'CCC']
 
df2 = df1[df1['My field'].isin(searchIn)]
 
print(tabulate(df2.head(5), headers='keys', tablefmt='psql', showindex=False))

Get lines with a field not finishing with a concatenation from other fields

Suppose you have this dataframe df:

LanguagePDF
PythondocA-Python.pdf 
SQL docF-SQ.pdf 
 PHP docS-PHP.pdf
CSS docG-CSS.pdf 
 Javascript docR-Javascript.pdf
 HTML docW-HTML.pdf

 

Now you would like extract lines where the PDF field does not finish with [Language field] + '.pdf':

Compare = df[['Language', 'PDF']].apply(lambda x: x['PDF'].endswith(x['Language'] + '.pdf'), axis=1)
 
df_temp = pd.DataFrame(Compare)
df_temp.columns =['True or false']
 
df_concat = pd.concat([df, df_temp], axis=1)
df_filtered = df_concat[['Language', 'PDF']][df_concat['True or false'] == False]
 
print(tabulate(df_filtered.head(5), headers='keys', tablefmt='psql', showindex=False))
 Language PDF
 SQL docF-SQ.pdf

Get line not contains an email

...
df_pb_Email = df[df['Email field'].str.contains(r'[^@]+@[^@]+\.[^@]+') == False]
...

Get line not equals an email

...
df_pb_Email = df[df['Email field'].str.fullmatch(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b') == False]
...

Lines with a field containing another field

df['A contains B'] = [x[0] in x[1] for x in zip(df['FieldB'], df['FieldA'])]
df= df[df['A contains B'] == False]

Lines with a field ending with another field

df['A finish with B'] = [x[1].endswith(x[0]) for x in zip(df['FieldB'], df['FieldA'])]
df= df[df['A finish with B'] == False]