You might want to have a warning "If you do this it will take a really long time unless you upgrade to a t2.medium blah." My vpc flow logs often run into the 26M/day and would be nice to put in a day's worth if I could run it over night. I don't mind upgrading to higher level instances.
P.S. Great tool. Laser focused and I don't have to have 10,000 instances of elastic search.
Just to test, I ran a job just now and saw traffic to an instance we hadn't used in 3 days and saw "rule in use", so it must be getting older data. You may want to use the Flow Log Viewer part of the product and see how far back your logs go for that particular log_stream. You can use the date picker to request logs for any arbitrary date and time, and we'll try to retrieve them from CWL.
The flow logs were over the weekend. But 'earliest' log shows up at today 13:00 or so, and ends at time I ran the query. There were some alb/elb's that were there, but I don't know enough about config to answer question.
Awesome results! I would love to document this in a blog post, do you mind?
I think that 4 hour limitation is from somewhere else. PiaSoft and CloudWatch Logs certainly don't limit you to the last 4 hours. When were your flow logs created? 'aws ec2 create-flow logs' is what I mean. Are your instances ELB or ALB instances that haven't been started for too long?
Looks like aws only supports about 4 hours worth of logs regardless of how many you put in.
Just an FYI on 25Million entries: 2 hours processing time 2GB memory.
Fake VPC # of records, Start Time, End time, 2gb memory usage.
MyVPC 25000000 2018-07-30 15:50:04 -0400 2018-07-30 17:02:58 -0400
FYI When I upgraded to a large 5M was almost instantaneous. Workers seem mostly memory bound. Very little CPU. Trying 25M to see what happens. So far 2Gb after 30 mins.