Page 14 sur 18
Filters
Get lines with a field containing a string
df2 = df1[df1['My field'].str.contains('AAA').fillna(False)] print(tabulate(df2.head(5), headers='keys', tablefmt='psql'))
Get lines with a field not containing a string
Just use ~
:
df2 = df1[~df1['My field'].str.contains('AAA').fillna(False)] print(tabulate(df2.head(5), headers='keys', tablefmt='psql'))
Get lines with a field matching a regex
Here we search where the field is a number with a comma as decimal separator.
dfTest = df_ImportTxt[df_ImportTxt['My field'].str.match(pat='^\d+,\d+$')==True]
Get lines searching several strings in a field
Example searching space, star and parenthesis:
MySearch = ['\*', ' ', '\(', '\)'] df2 = df1[df1['zone'].str.contains('|'.join(MySearch)).fillna(False)] print(tabulate(df2.head(40), headers='keys', tablefmt='psql'))
Get lines with a field in digit
df2 = df1.loc[df1['zone'].astype(str).str.isdigit()] print(tabulate(df2.head(5), headers='keys', tablefmt='psql'))
Get lines according several condition
Below we get lines with a field with null values and another field with empty values:
df2 = df1[(df1['My field'].isna()) | (df1['My other field'] == '')] print(tabulate(df2.head(5), headers='keys', tablefmt='psql', showindex=False))
Get lines searching if values are IN a list
searchIn= ['AAA', 'BBB', 'CCC'] df2 = df1[df1['My field'].isin(searchIn)] print(tabulate(df2.head(5), headers='keys', tablefmt='psql', showindex=False))
Get lines with a field not finishing with a concatenation from other fields
Suppose you have this dataframe df
:
Language | |
Python | docA-Python.pdf |
SQL | docF-SQ.pdf |
PHP | docS-PHP.pdf |
CSS | docG-CSS.pdf |
Javascript | docR-Javascript.pdf |
HTML | docW-HTML.pdf |
Now you would like extract lines where the PDF field does not finish with [Language field] + '.pdf'
:
Compare = df[['Language', 'PDF']].apply(lambda x: x['PDF'].endswith(x['Language'] + '.pdf'), axis=1) df_temp = pd.DataFrame(Compare) df_temp.columns =['True or false'] df_concat = pd.concat([df, df_temp], axis=1) df_filtered = df_concat[['Language', 'PDF']][df_concat['True or false'] == False] print(tabulate(df_filtered.head(5), headers='keys', tablefmt='psql', showindex=False))
Language | |
SQL | docF-SQ.pdf |