I'm trying to merge many (a few thousand one column tsv files) data frames into a single csv file using pandas. I'm new to pandas (and python for that matter) and could use some input or direction.
My data frames are observational data on a list scraped from the web and do not contain headers. For example:
data frame 1:
bluebird 34
chickadee 168
eagle 10
hawk 67
sparrow 2
data frame 2:
albatross 56
bluebird 78
hawk 3
pelican 19
sparrow 178
I'm looking to do is simply create a master file with all of the individual observations:
albatross 0 56
bluebird 34 78
chickadee 168 0
eagle 10 0
hawk 67 3
pelican 0 19
sparrow 2 178
I've tried to merge the data frames one at a time using pandas:
import pandas as pd
df1 = pd.read_table("~/home/birds1.tsv", sep='\t')
df2 = pd.read_table("~/home/birds2.tsv", sep='\t')
merged = df1.merge(df1, df2, how="left").fillna("0")
merged.to_csv("merged.csv", index=False)
but I am only getting one column. I don't have a master list of "birds", but I can concatenate all the data and sort on unique names for a dictionary list if this is needed.
What should my strategy be for merging a few thousand files?