How to use Python to save time auditing a logfile.
A recent system migration had me auditing user activity in a file sharing platform. The web interface does not make the process easy, simply listing each entry in an endless scroll. Fortunately, I was able to export the file to a two-column csv. Col 1 was titled Action, 2 was Trace.
The migration and installation of a new agent combined with other infrastructure adjustments meant that users would have to remember their email / network password to login. Something new as of this post migration morning. Most had no trouble because we had warned them, trained them and setup the system to work smoothly.
Each row I wanted to audit had a sequence like this:
Client_Login_Success dave.mustaine@heavymetal.com,305.298.763.401,
Login_Failed dave.mustaine@heavymetal.com,305.298.763.401,AUTH_FAILED
Login_Success dave.mustaine@heavymetal.com,305.298.763.401,
I wanted to see who never managed to login. If they figured it out after a few tries then excellent! No need to follow up.
I am new to Python but it was my tool of choice today. I put together a script that zipped through the logfile and spit out a line for each user that did not succeed in the end.
This script first creates two dictionaries to store lines with status “success” and “failure”. It then reads the CSV file using the csv.DictReader class, which creates a dictionary for each row with column names as the keys and cell values as the values. The script then checks each row to see if it has a status of “success” or “failure”, and adds it to the appropriate dictionary.
Finally, the script checks each failure line to see if there is a corresponding success line with the same user. If not, it prints a message indicating that a success line was not found.
This script assumes that the CSV file has columns “Action”, “Trace”. There were two different types of success/failure messages so I had to put elif statements to catch all of them. I know this would prove to be inefficient if I were parsing millions of lines but it worked here just fine.
import csv
# create dictionaries to store lines with status "success" and "failure"
success_lines = {}
failure_lines = {}
# read the CSV file and store the lines in the appropriate dictionary
with open( 'C:\\Users\\TotallyNotTheAdministratorAccount\\Downloads\\audit.csv', 'r') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
if row['Action'] == 'Client_Login_Success':
success_lines[row['Action']] = row
elif row['Action'] == 'Client_Login_Failed':
failure_lines[row['Action']] = row
if row['Action'] == 'Login_Success':
success_lines[row['Action']] = row
elif row['Action'] == 'Login_Failed':
failure_lines[row['Action']] = row
# check for failure lines without corresponding success lines
for Trace in failure_lines:
if Trace not in success_lines:
print(f"No success line found for user {Trace} with failure status {failure_lines[Trace]}")
The output printed the whole row so I could confirm things like date time, ip address and other details. I’ll bet I could clean it up, add formatting or an action but for this effort, that is all I needed.
The output:
No success line found for user Login_Failed with failure status {'Action': 'Login_Failed', 'Trace': '
Next I think I’ll try the same thing with Powershell. This script and the post were assembled while listening to the Raised on Metal Spotify playlist.