How can I select all rows with count of rows >= 2?
I have the following pandas dataframe.
df = pd.DataFrame({"date": ["2000-01-03", "2000-01-04", "2000-01-04", "2000-01-04", "2000-01-04",
"2000-01-03", "2000-01-04", "2000-01-05", "2000-01-05",
"2000-01-03", "2000-01-05", "2000-01-05",
"2000-01-04", "2000-01-05"],
"sym": ["A", "A", "A", "A", "A" ,"B", "B","B", "B" ,"C", "C", "C", "D", "E"],
"val1": [1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 2, 2],
"val2": [2, 2, 2, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2, 2]
})
df
date sym val1 val2
0 2000-01-03 A 1 2
1 2000-01-04 A 1 2
2 2000-01-04 A 1 2
3 2000-01-04 A 1 2
4 2000-01-04 A 1 2
5 2000-01-03 B 2 2
6 2000-01-04 B 2 3
7 2000-01-05 B 2 3
8 2000-01-05 B 2 3
9 2000-01-03 C 3 1
10 2000-01-05 C 3 1
11 2000-01-05 C 3 2
12 2000-01-04 D 2 2
13 2000-01-05 E 2 2
I applied
df.groupby(['date', 'sym'], as_index=False).mean().sort_values(['sym','date'])
to average val1, val2 for a given date for each sym.
date sym val1 val2
0 2000-01-03 A 1.0 2.0
3 2000-01-04 A 1.0 2.0
1 2000-01-03 B 2.0 2.0
4 2000-01-04 B 2.0 3.0
6 2000-01-05 B 2.0 3.0
2 2000-01-03 C 3.0 1.0
7 2000-01-05 C 3.0 1.5
5 2000-01-04 D 2.0 2.0
8 2000-01-05 E 2.0 2.0
Next, I need to select all rows for "sym" with row-count >= 2. in this example, the resulting df would be all rows from sym=A,B,C
Desired output:
date sym val1 val2
0 2000-01-03 A 1.0 2.0
3 2000-01-04 A 1.0 2.0
1 2000-01-03 B 2.0 2.0
4 2000-01-04 B 2.0 3.0
6 2000-01-05 B 2.0 3.0
2 2000-01-03 C 3.0 1.0
7 2000-01-05 C 3.0 1.5
I tried combinations of groupby, pivot, count with no luck.