Index de l'article

Split file

Here below is not right about Pandas, but usefull in some Pandas contexts.

First we count the number of lines in the file (from the recurrence of \n), then we split it in files with 50,000 lines if it exceeds 50,000 lines.

with open(workDirectory+'FixQueries.sql', 'r') as myfile:
data = myfile.read()
taille_fichier = data.count("\n")
 
lines_max = 50000
numero_fichier = 0
if taille_fichier > lines_max:
    print('Attention : le fichier de sortie fait '+str(taille_fichier)+' lignes ! Veuillez patienter...')
    smallfile = None
    with open(workDirectory+'FixQueries.sql') as bigfile:
        for lineno, line in enumerate(bigfile):
            if lineno % lines_max == 0:
                if smallfile:
                    smallfile.close()
                numero_fichier += 1
                small_filename = workDirectory + 'FixQueries {}.sql'.format(numero_fichier)
                smallfile = open(small_filename, "w")
            smallfile.write(line)
        if smallfile:
            smallfile.close()
            print('Nous l\'avons découpé en ', ceil(taille_fichier/lines_max), 'fichiers !\n')
    file1.close()
    os.remove(workDirectory+'FixQueries.sql')

 

And then merge the files:

filenames = ['C:/_gh/0/file_25000.txt', 'C:/_gh/0/file_50000.txt', 'C:/_gh/0/file_75000.txt', 'C:/_gh/0/file_100000.txt', 'C:/_gh/0/file_125000.txt']
 
with open('C:/_gh/0/CUMUL1.txt', 'w') as outfile:
    for names in filenames:
        with open(names) as infile:
            outfile.write(infile.read())

 

Maybe you will need to list the names of the files before to merge them, with PowerShell (Alt+F+R):

get-childitem | select-object -expandproperty name > _files_list.txt

Be careful to keep the order of the file, to keep the last line empty.